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(U) EXECUTIVE SUMMARY: 


(S/NF/SG/LIMDIS) In compliance with the Congressional 
conferees! request (Appendix A), DIA proposes to develop a multi- 
year research and development progran, subject to rigorous 
scientific and technical oversight, to demonstrate the scientific 
validity of the STAR GATE program, and that results of military 
and intelligence value can be obtained in a cost-effective manner 
using anomalous mental phenomena (AMP). 


(S/NF/SG/LIMDIS) This proposed program, if successfully 
implemented, will: 


- Identify the underlying mechanisms of AMP. 


- Establish the limits of operational usefulness of 
AMP. 


-~ Determine the degree to which foreign activities in 
AMP represents a threat to national security. 


- Lead to the development of countermeasures to 
neutralize this threat. 


- Use research findings to improve operational 
activities. 


- Develop data fusion criteria to integrate AMP results 
with other intelligence sources. 


(S/NF/SG/LIMDIS) Due to the diversity of the STAR GATE 
mission/objectives, both external resources and in-house 
expertise are required. Since this Activity possesses no in- 
house R&D capability, an absolute need for external R&D support 
is required to meet Congressional concerns which are addressed in 
this program plan. A balance will be maintained between external 
and in-house activities, and every effort will be made to 
integrate and link these activities where appropriate. The 
external aspect permits a wide range of expertise covering many 
disciplines to be focused on this area; this also has the benefit 
of ensuring peer group review and of facilitating a variety of 
scientific interactions. In-house personnel with a wide-range of 
expertise in this phenemenology will need to be retained to make 
this proposed plan work. 


(S/NF) In order to review the major tenets of the draft 
program plan, the Defense Intelligence Agency will convene a 
panel of appropriate scientists to provide recommendations on the 
plan and the research it achieves. Based on the panel's 
recommendations, the Defense Intelligence Agency will then submit 
a budget line item to fund those approved objectives. 


SECRET 
NOT RELEASABLE TO FOREIGN NATIONALS 
STAR GATE 
LIMDIS 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


a... ree 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


SECRET 


(C) An annual report will document the current 
operational, technical and administrative status of the progran. 


I. (U) INTRODUCTION: 


(S/NF/SG/LIMDIS) This program plan was developed in 
response to a Defense Authorization Conference, Congressionally 
Directed Action (CDA) to prepare a long-term systematic and 
comprehensive research and peer review plan in order to 
investigate anomalous mental phenomena (AMP), and to apply 
program research results to potential operational activities. 
This plan also describes key in-house activities along with an 
appropriately integrated basic and applied external research 
support effort. 


(S/NF/SG/LIMDIS) Specifically, this program plan 
represents DIA's view on how best to proceed with both in-house 
activities and external research support for the period of FY95 
through FY99. Research findings, both domestic and foreign, and 
results from operational activities may lead to updates of this 

= plan in order to reflect improved phenomena understanding and to 
pursue follow-on research and/or application directions. 


(S/NF/SG/LIMDIS) A underlying and fundamental premise 
governing the implementation of this program plan is that a well- 
integrated interdisciplinary approach is considered to be the 
most appropriate strategy for conducting research in this diverse 
field. Consequently, this plan includes a wide variety of 
research topics which are based on recent findings from leading- 
edge pursuits in other disciplines that are suspected of being 
germane for STAR GATE. Other topics are derived from a review of 
worldwide research, consultations with leading area experts, and 

= on insights gained from previous research and application 
activities associated with the STAR GATE program. 


(S/NF/SG/LIMDIS) This program plan also allows for the 
STAR GATE program to show results that are cost effective and 
will at the same time satisfy reasonable program performance 
criteria. The implementation of this program plan will preclude 
the reoccurrence of the yearly cyclical activity of project 
start-up, limited progress, followed by anticipated project shut- 
down which previously inhibited program activity. 


(S/NF/SG/LIMDIS) In sum, the implementation of this 
=. research and peer review plan will allow DIA to successfully 
accomplish identified R&D activities which, in-turn, will enhance 
the capability of STAR GATE personnel to engage in operational 
activities and to assess the work done by potential adversaries, 
thereby, reducing the risk potential for a technological 
surprise. - 
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(U) Terminology and definitions are discussed at 
Appendix B. 


It. (U) PLAN OBJECTIVES: 


(S/NF/SG/LIMDIS) The objective of this follow-on research 
and peer review plan is to further develop phenomena 
understanding and/or validation, in applications understanding, 

? and in operational feasibility evaluation. This continued work 
will have a direct bearing on DIA's ability to both assess the 
significance of foreign research and to perform a systematic 
review of potential applications regarding this phenomena. 


(S/NF/SG/LIMDIS) Accomplishment of the various activities 
identified in this plan will further enhance threat assessment of 
foreign achievements in this area, and will help achieve the 
potential for U.S. military/intelligence applications on select 
tasks as a supplement to HUMINT operations. 


(U) It is anticipated that this plan will assist decision 
# makers in their review and consideration of future directions for 


this field, and that this plan can begin formal implementation 
starting in FY95. 


(S/NF/SG/LIMDIS) In compliance with the Congressional 
conferees' request, DIA recommends that a period of six to nine 
months be set aside at the beginning of this new program for the 
purpose of identifying the most promising and cost-effective 
experiments to be conducted under the program to meet the overall 
research objectives outlined below. It is further suggested that 
a series of small working groups consisting of scientific experts 
from a variety of pertinent disciplines meet during this time 

“ period to accomplish this end. 


III. (U) SIGNIFICANCE OF EFFORT: 


(S/NF/SG/LIMDIS) STAR GATE is a dynamic approach for 
pursuing the largely unexplored area of human consciousness and 
subconsciousness interaction. Its scope is comprehensive; a wide 
range of phenomenological issues are examined that include 
psychological, physiological/neurophysiological, physics and 
other leading-edge scientific areas. Although broad in scope, 
STAR GATE is well grounded due to its solid independent 
scientific review base. STAR GATE is based on a dynamic style in 

“ all its endeavors, especially in its pursuit of on-going foreign 
activities in this area. 


(S/NF/SG/LIMDIS) One of the tasks previously levied on DIA 
by the FY91 Defense Authorization Act was to develop a long-range 
comprehensive plan for investigating parapsychological phenomena. 
This task was one of several objectives included in a new program 
for this phenomenological area that identified DIA as executive 
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agent. Moreover the FY91 Defense Authorization Act authorized 
for DIA a funding level of $2 million for DIA in order to 
initiate this new program. As a result, a balanced and 
integrated plan to include operations, foreign assessment, and 
research and development was implemented . In addition, a new 
DIA limited dissemination (LIMDIS) program, codeword STAR GATE, 
was established in order to accomplish the objectives that were 
set forth in this plan. 


(S/NF/SG/LIMDIS) The external research support conducted 
under monies appropriated to date comes to a close in the June 
1994 time-frame. The impact of this is that if research 
activities utilizing human subjects are interrupted, it has 
generally been necessary to begin again instead of later resuming 
activities from the point of termination. Consequently, it is 
important for the STAR GATE program to remain stable. Research 
involving human use differs considerably from that involving 
physical systems. For example, data from human subjects cannot 
be collected nor analyzed as rapidly, in that additional 
empirical data is often required to reach analytical conclusions. 
This type of data analysis utilizing human subjects can only be 
achieved with an in-place, uninterrupted, multi-year research and 
development program. Therefore, should it be decided to go 
forward with this program, it should be done in a timely fashion. 


(S/NF) The funding allocation for external research 
received by STAR GATE in FY91 and continued through FY 1993 
permitted several important research areas to be initiated and 
continued. It is anticipated that results of this research will 
assist in clarifying some of the possible future research 
directions; consequently, not all long-range research 
possibilities can identified in this plan. However, most all of 
the major investigation areas can be addressed, and many of the 
specifics can be identified with reasonable confidence. 

Figure 1 presents an overview of overall research objectives for 
both Anomalous Cognition (AC) and Anomalous Perturbation (AP) 
which will be considered for inclusion in this program. 
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(S/NF) Previous basic research activities from FY91 
through FY93 focused on the following; (1) validating findings 
from previous magnetoencephalograph (MEG) research and initiating 
new work with a variety of conditions and individuals; (2) 
performing a variety of anomalous cognition (AC) experiments to 
determine potential correlations (e.g., target type, 
environmental factors); (3) developing various theoretical 
constructs that might be testable and that could help explain the 
phenomena; (4) examining effects of altered states on data 
quality; (5) initiating review of and research into the 
energetics area; and (6) examining various application 
possibilities (e.g., communication, search). 


(U) Results from previous basic and applied research 
activity have been factored into this research and development 
plan and provide the basis upon which further R&D efforts will be 
built. 


IV. (U) PLAN OVERVIEW: 
A. (U) BASIC RESEARCH OBJECTIVES 


(S/NF/SG/LIMDIS) The objective of basic research is to 
understand the fundamental, underlying mechanisms for AMP. To 
achieve this objective in an efficient way, basic research of the 
detection mechanism should begin in a conservative direction. 
That is, assume that a putative "sensorial" system exists for AMP 
and that it most likely will behave similarly to those common 
elements which are known through the five senses. This 
conservative approach generalizes to understand the source of AMP 
and its propagation mechanisms (Figure 1). 


B. (U) APPLIED RESEARCH OBJECTIVES 


(S/NF/SG/LIMDIS) The objective of applied research is 
to improve AMP functioning to its maximum possible limit. To 
realize this objective, it is critical to define AMP output 
measures that are consistent with either a laboratory setting 
and/or an operational environment. The approach should also 
reflect scientific conservatism. In investigating any single 
variable (e.g., different training methodologies) all other 
variables should remain as constant as possible (e.g., use the 
same individuals and known good target systems). 


Cc. (U) FOREIGN ASSESSMENT SUPPORT OBJECTIVES 


(S/NF) From a research perspective, the objective of 
foreign assessment is to determine the degree to which claims 
from foreign laboratories can be confirmed in a U.S.-based 
setting. In science, replication is critical for understanding. 
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Vv. (U) BASIC RESEARCH PLAN FOR ANOMALOUS COGNITION: 


A. (U) BASIC APPROACH 

(S/NF) The link of basic and applied research with 
other applications investigations or with research activities is 
shown on Figure 2. The top of the chart shows that for any 
research or application task, certain conditions must be met 
(e.g., a reliable calibrated individual is required; proper 
scientific procedures need to be developed, etc.). Once these 
basic foundations are laid, then basic/applied research can be 
initiated with a reasonable expectation of success and with 
assurance that results will not be ambiguous or fail scientific 
scrutiny. 


(S/NF) This chart also illustrates the difference 
between basic and applied research; applied research relates to 
various methods for collecting, recording, improving and 
analyzing data output, while basic research is aimed at phenomena 
understanding. In this chart, the "detector" is the human 
brain/mind, the "source" is the target or an aspect of the 
target, and "transmission" refers to notions of how information 
and/or energy are actually transmitted between source and 
detector. 


(U) Figure 3 illustrates the interdisciplinary scope 
that will be brought to bear on this research problem. Leading- 
edge researchers in their various fields can provide clues, if 
not make direct contributions, that will assist in phenomena and 
applications understanding. Appendix C lists candidate research 
support facilities that could be involved in this long-range 
effort. Appendix D outlines pertinent research literature 
applicable to this field. Final selection will be based on how 
well the activities if these institutions will fit into specific 
time-lines and priorities to be established in FY95. Figure 4 
lists milestones for the anomalous cognition basic research to be 
conducted under this plan. 


B. (U) RESEARCH DETAILS 
1. (U) Source. 


(S/NF/SG/LIMDIS) Source research will address 
those topics that show promise for understanding the 
characteristics of the target or target area that may play a role 
in anomalous cognition (AC) occurrence and data quality. Aspects 
of the target that can be defined by conventional information 
theory (involving entropy/information content) will be explored 
in-depth. A wide variety of targets with a wide range of 
information content, dynamics, or other parameters will be 
examined to explore this possible link. If not successful, other 
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approaches to investigate the targets! innate nature and its 
possible link to phenomenon occurrence will be initiated. 
Definitive data in this area would also have implications for 
defining those targets which have the highest probability of 
successful data acquisition in an operational setting, thus 
establishing operational tasking parameters. 


2. (U) Transmission. 


(S/NF) The pursuit of possible transmission 
mechanisms for AC phenomena is essentially the most significant 
basic research task and also the most difficult to formulate. In 
this effort, a theoretical basis will be developed from 
extensions of current theory in light of recent advanced physics 
formulations. Some of these formulations permit unusual 
"information flows" that may, in fact, have relevance for this 
phenomenon. Testable models/constructs will be developed and 
evaluated. A variety of other possible explanations involving 
extensions of gravitation theory, quantum physics or other areas 
will be constructed and tested where possible. Some of these 

/ tests may require close cooperation of leading-edge researchers 
using equipment in their facility. 


(C/NF) Effort in this area will also focus on 
integrating diverse aspects of the source, transmission, and 
detector categories. For example, it will examine how 
"targeting" occurs. Insight will be drawn from in-depth reviews 
of various unusual physical effects identified by physical 
Sciences researches. These include distant particle coupling 
(Bell's theorem), ideas from quantum gravity, possible 
electrostatic/gravity interactions, unusual quantum physics, 
observational theories, vacuum "energy" potential, and a variety 

# of other concepts. 


(S/NF) Perhaps the most promising exploratory 
model of all is one based on little-understood aspects of the 
fundamental equations for electromagnetic wave propagation 
(Maxwell's equations). These equations indicate that forms of 
"wave propagation" could also exist that do not have the 
conventional electric or magnetic field components (i.e., vector 
and scalar waves). These waves would not be blocked by matter 
and therefore could be leading candidates for AC propagation or 


for certain aspects of AC phenomenon. Research pee 
ee enna eeeeenenrseneeaee | that theses 9O1P 
id Waves are considered a leading candidate for AC transmissions by 


their researchers. Pilot study investigations in this area were 
conducted by PAG-TA in FY92 with promising preliminary results. 
Future research could couple with other DIA exploratory R&D 
efforts in this area currently being explored. 
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(S/NF/SG/LIMDIS) Research on this topic will be 
closely integrated with research involving the anomalous 
phenomena (AP) aspect, since findings in the AP area would have 
direct implications for phenomena transmission mechanisms in 
general. Findings from the target (or target source) research 
area would also provide insight into possible transmission 
mechanisms. For example, different forms of the same target 
(e.g., target size, 2D vs 3D, holographic representations) may 

y show patterns in the AC data that might provide clues regarding 
phenomena mechanisms. 


3. (U) Detector. 


(U) The most important and promising aspect of 
understanding the nature of the Ac detection system in humans is 
through modern advances of the neuroscience. Earlier 
neurophysiological results obtained from magnetoencephalograph 
(MEG) measurements begun in FY92 will be validated and expanded. 
This earlier work indicated MEG correlations between visual 
evoked responses areas of the brain may exist, and that remote 

cd stimuli might also be detectable in MEG data. Some of the 
specific investigations will examine a variety of near and far- 
field situations, other sensory modes and different types of 
individuals in order to search for potential variables. It might 
be possible, with advanced MEG instrumentation, to actually 
locate the exact brain areas involved in AC phenomena occurrence. 
Future research in this area could couple with research currently 
being explored at the National Laboratory. 


(U) Other physical/psychophysical aspects of the 
central nervous system (CNS) will also be explored to look for 
possible correlates. This would include galvanic skin responses 

# (GSR) or other parameters. 


(U) Related to this overall area are several 
investigations that relate to possible environmental interactions 
with the brain that could affect AC data. This would include 
possible geomagnetic or electromagnetic influences. 


(S/NF) A spin-off from findings in this basic 
research area could be for unique communication applications. 
MEG correlates might exist between remotely located people. If 
so, the possibility of transmission of remote messages (via a 
type of code) might be possible. Since AC phenomenon is not 

“ degraded by distance or shielding, the potential of transmitting 

basic "messages" to individuals in submarines would exist. 
Preliminary exploration of this application by PAG-TA has yielded 
promising results. 


(S/NF) Another potential spin-off benefit from 
detector research in this program is that new insights into brain 
memory or parallel processing might be achieved. This could lead 
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to new directions in advanced compute lopments involving 
neural networks. For example, recent indicates that SG1B 
"wave-like" brain activity occurs in addition to usual neuronal 


processes. This wave-like phenomenon may have some link to the 
"phase shift" observed in MEG data from the previous MEG project. 
Further MEG work involving remote stimuli may help clarify such 


issues. 


’ 4. (U) Integration. 


(U) The basic research activities will liberally 
avail itself of the existing research communities that specialize 
in neuroscience, physics and statistics and the broader 
psychological/social sciences. Direct support with a variety of 
university departments, national and international, will be 
explored. PAG-TA contacts with such national laboratories as Los 
Alamos, Lawrence Livermore, Oak Ridge, and have indicated an 
interest on their part in supporting the research efforts. 
Frequent conferences and data exchanges are anticipated. These 
data exchanges will insure that a proper interdisciplinary 

a” approach is maintained, and that findings from other disciplines 
will be incorporated in this program where appropriate. This 
peer group dialogue will greatly benefit research sponsored 
through this plan, new ideas will be generated, and possibly 
clues regarding phenomena operation will be easier to identify. 


(U) Some specific interdisciplinary examples that 
will benefit this program are as follows: 


- In 1990 The American Anthropological 

Association (AAA) formed a new division, the Society for the 
Anthropology of Consciousness (SAC). This division has 

P) established a technical journal to support interdisciplinary, 
cross-cultural, experimental, and theoretical approaches to the 
study of consciousness. This group may be able to contribute 
this program by providing cross-cultural examples. These members 
might also assist in the assessment of foreign data in this area. 


- The psychophysiology of vision has already 
contributed to the earlier program. This plan calls for a 
collaborative effort with researcher in an attempt to understand 
how the central nervous system process subliminal stimuli. This 


shoulda assist in understanding how MEG correlates occur. 


o - The relationship between mind and body is 
currently discussed in the research literature as well as in the 
' popular press. Researcher at the California Institute for 
Transpersonal Psychology (CITP) have been active in investigating 
the role of mental attitudes and body chemistry. While there may 
not be a direct link with AC, and exchange of techniques and 
experimental designs would be helpful. : 


SECRET 
NOT RELEASABLE TO FOREIGN NATIONALS 
STAR GATE 
w LIMDIS 
13 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 | 


SECRET 


- The Journal of Cognitive Neuroscience 
contains at least one article of interest in each issue. This 
discipline is where most of the cognitive work with the 
neuromagnetism is conducted. There is the possibility of joint 
investigations with researchers performing MEG investigations at 
the National Institutes of Health (NIH). 


- Stanford University has been conducting 
research on internal mental imagery. The manipulation and 
control of this imagery is extremely important in understanding 
the source of internal noise during an AC session. A 
collaborative effort with Stanford should lead to methods for 


noise reduction. 


- Neural networks are particularly good at 
recognizing subtle patterns in complex data, and are being 
applied in the subjective arena of decision making in business. 
In order to improve AC analysis, the program will conduct a 
collaborative effort with scientists who are active in neural 
network research and with selected individuals who have had 
success with interpreting highly subjective data. 


- Statistics is the heart of AC research in 
that most of the results are usually quoted in statistical terms. 
Hypothesis testing has traditionally been the primary focus, but 
there are other possible approaches that should be explored. 
Statistics researchers at Harvard have already expressed interest 
in contributing to the research effort. 


- A major portion of the effort will be a 
search for a AC evoked response in the brain. Sophisticated 
processing is required in that magnetic signals from the brain 
can not be easily characterized by standard statistical 
practices. Several research facilities can contribute. 


- Classical statistical thermodynamics may be 
the heart of understanding the nature of an AC source of 
information. A physical property called entropy may be related 
to what is sensed by AC. The program intends to collaborate with 
a variety of university physics departments to calculate the 
appropriate parameters. 

(S/NF) The specific experiments to be conducted in 
these research domains will be defined during the first six to 
nine months of the program utilizing the recommendations of the 
working groups mentioned above subject to approval by the 
Scientific Oversight Committee. 


vI. (U) BASIC RESEARCH PLAN FOR ANOMALOUS PERTURBATION: 


(S/NF) Figure 5 illustrates the basic approach for 
investigations "energetics", or anomalous perturbation (AP) 
phenomenon. Intelligence reporting indicates that this aspect of 
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Figure 5 (U) Basic Research Milestones - Anomalous Perturbation 
(To Include Biological Systems) 
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attention in is research plan ical 


surprise. Thus, beginning in FY95, acceptance criteria will he 
establish with which to judge the historical literature for 
potential AP effects. Using those criteria, a detailed review of 
the literature will begin in mid FY95 and considering the size of 
that data base will continue through FY95. Knowledge gained from 
this review may provide insights for the development of new AP 
target systems or provide data so that particular experiments can 
be replicated. Given the complexity of most AP experiments, 
considerable time is needed to plan and conduct them properly. 

If the results warrant, then application development may begin as 
early as FY96; however the primary task of basic research of AP 
is to attempt to validate its existence. Findings from foreign 
research will be examined and factored into this activity as 
appropriate. 


(S/NF) The keys to investigating this area will be in 
appropriate personnel selection and, very likely, in proper 
selection of the AP test device. Thus, the initial phase of this 
effort will involve identification and solicitation of 
individuals known or claimed to have such talents. For example, 
certain expert martial arts or yoga practitioners might do well 
in such experiments due to their strong mental conditioning and 
ability for intense mental focus. After locating such 
individuals, various instruments, such as microcomputer devices, 
sensitive electronic/sensor devices, or other unique or sensitive 
equipment would be used as targets in AP experiments. 


(S/NF) Some of the unique sensor candidates include 
devices that are highly sensitive to very weak gravitational 
effects (such as Mossbauer devices or atomic clocks). Perhaps 
the most promising device is one that involves detection of an 
unusual non-electromagnetic wave (A vector/scalar wave). If 
experiments with such sensors are successful, then significant 
understanding of AP or AC phenomenon would occur. Experiments 
with such a device is a distinct near-term possibility; 
consequently this will be given high priority in the early part 
of this long-range program. 


(S/NF) Should these pilot experiments prove successful, 
then a near and distant experiments would be developed for a wide 
variety of devices to evaluate application aspects. Potential 
applications could include, for example, remote switching (in a 
communication role) or possibly as a countermeasure to minimize 
effectiveness of threat systems such as sensitive computer 
components or sensors. Similarly, if these results are 
successful, they would provide insight regarding potential 
threats to U.S. systems or security. 
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(S/NF) The specific experiments to be conducted in these 
research domains will be defined during the first six to nine 
months of the program utilizing the recommendations of the 


working groups mentioned above. 


VII. (U) APPLIED RESEARCH PLAN FOR ANOMALOUS COGNITION: 
(U) Figure 6 illustrates the overall plan for the applied 
research portion for several main functional categories. 


a. (U) SELECTION 


(C) The most promising potential for selecting 
individuals is to identify ancillary activity that correlates 
with AC ability. If such a procedure can be identified, then 
receiver selection can be incorporated as part of other screening 
tests (e.g., fighter pilot candidacy), and thus large populations 
can be used. Among the items that will be examined are 
physiology (e.g., responses of the brain to external stimuli) and 
hypnotic susceptibility (i.e., an individuals predisposition for 
being hypnotized). The results of this effort will be examined 
continuously; however, a decision to end the investigation will 
occur in mid FY96. Should the results at that time warrant, then 
refining of the techniques will continue to the end of FY 1998. 
The reason the initial research spans several years is that to 
validate even.one psychological finding requires long-term 
testing of candidate individuals. Current statistical methods 
require many AC sessions, and experience has shown that only a 
few sessions can be conducted per week for any single individual. 


(Cc) The previous program was able to estimate 
that approximately one percent of the general population 
possessed a high-quality, natural AC ability. Because the 
empirical method (i.e., asking large groups to attempt AC) is 
labor intensive and very inefficient, it is included in the 
research plan only as an alternate approach. 


b. (U) TRAINING 


(S/NF) Training has been a major part of the 
previous program; however, results of training approaches have 
been difficult to evaluate and have not been examined 
systematically. Systematic review of this issue was begun in FY 
92. One of the methods that will be examined involves lowering 
an individual's visual subliminal threshold (i.e., the level 
below which an individual is not consciously aware of visual 
material). This could enhance the individual's sensitivity to AC 
data. Other forms of altered states, such as dreaming and 
hypnosis, will also be evaluated to see if such states can 
enhance AC data quality. : 
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(U) Results on these issues should be available 
at the close of FY95. If no progress has been observed and if 
there have been no positive results from the basic research, the 
task ends. However, should any of the variables examined appear 
promising then the task will be continued. 


(S/NF) It is anticipated that all laboratory 
successes must be validated by simulating operational tasks. 
These experiments involve identifying the specialty to be tested, 
the acceptance criteria, and conducting sessions in which the 
complete target systems are know. This three-year activity runs 
concurrently with the other tasks but with a one-year offset to 
allow for planning. 


c. (U) TARGET/APPLICATION SELECTION 


(C) Based on earlier research, the most promising 
approach to target selection appears to be a single physical 
characteristic called entropy (i.e., a measure of inherent target 
information). Beginning in FY95, two and one half years have 
been allocated for the detailed study of this aspect of target 
properties. Initially, little experimentation is required; 
rather, a retrospective examination of previous target systems 
should indicate if this approach is valid. Included in this 
examination are detailed calculations of the information content 
of natural target scenes. 


(S/NF) Beginning in mid FY96, other potential 
intrinsic target properties will be examined. For example, a 
target may be more readily sensed by Ac if the collection of 
elements at the site (e.g., landmark, buildings, roads) 
constitute a conceptually coherent unit as opposed to a collage 
of unrelated items. Quantitative definition of targets will also 
be developed that include non-physical target parameters such as 
function, meaning, or relationships. These aspects are highly 
important in most operational projects and need to be quantified. 


(S/NF) Part of this effort will involve 
investigations that serve two purposes: (1) add insight into 
the phenomenon; and (2) help evaluate the feasibility of certain 
potential applications. For example, long distance experiments 
could be conducted to or from deep caves or submarines in deep 
water to test communication potential and transmission theories. 
Experiments could also be conducted to targets on board space 
platforms to test distance and gravitational effects. 
Experiments to or from magnetically shielded rooms or certain 
earth locations (e.g., the magnetic pole) might indicate if 
magnetic fields influence the phenomenon. Experiments to 
opposite sides of the earth might also indicate if a mass or 
gravity effect can be noted. 
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(S/NF/SG/LIMDIS) This area of investigation will 
be integrated with a variety.of applications in coordination with 
findings/investigations pursued by the in-house effort. Figure 9 
identifies the main application or operational areas. Along with 
types of data desired. This activity will be integrated, where 
possible, into in-house pursuits that will explore these areas in 
a systematic fashion. Initial emphasis will be in 
counternarcotics and counterterrorism areas. 


(S/NF/SG/LIMDIS) Specific types of applications 
that will be explored in-depth include the search problem. 
Search tasks are expected to remain as high priority operational 
tasks (e.g., hostage location, lost equipment or system 
location). Search tasks are complicated by timing issues, 
especially if the missing target is being moved frequently. 
Related to this will be examination of predictive capability in 
order to evaluate feasibility of detecting hostile plans and 
intentions in advance. Pilot studies of other areas (e.g., code 
breaking, medical diagnostics, low intensity conflict support) 
will also be initiated. 


(S/NF/SG/LIMDIS) Another application area that 
will be examined is "communications". Previous research 
indicates that with proper protocols, basic or coded messages can 
be sent and received via AC procedures. Redundant coding methods 
can readily enhance probability of success, and new statistical 
methods can also improve success rates. Communication 
applications may have significant value for search problems by 
providing additional information on location of kidnapped or 
hostage victims. Such techniques might also help in determining 
hostage or POW state-of-health or other significant issues. 


ad. (U) PROTOCOLS 


(U) Given the laboratory success of AC 
experimentation, the protocol task can build upon a substantial 
literature. Determining optimal, specialty-dependent protocols 
only require extending current concepts. Several years are 
required due to the statistical nature of analysis that is 
required to determine the effects of environment, receiver, 
target and feedback conditions. Several high-interest 
application areas (such as search/location) will be examined in 
detail. A variety of session procedures will be evaluated to 
determine those that are beneficial to improving data quality. 


(S/NF) Protocol effectiveness may be measured by 
quality, quantity, and/or usefulness of the AC information 
elicited by its use. The requirements for protocols that are 
designed for laboratory settings are considerably more 
restrictive than those required for operational settings. For 
example, providing limited information to a receiver while an 
operational session is in progress (i.e., intermediate feedback) 
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might facilitate the acquisition of the desired data. This kind 
of feedback is strictly prohibited, however, in most protocols 
designed for laboratory experiments. Protocols may also vary 
depending on nature of the data required. For example, for some 
search projects, only general data may be adequate. For such 
cases would not require development of highly specific details 
and protocols the sessions would not be as complex. 


(U) A detailed protocol will need to consider a 
variety of potential session variables such as the individuals' 
physical environment, mental state and attitude, and how the 
target or task is designated (e.g., coordinates, abstract terms). 
Other data includes specifics of the session (monitor present or 
not), type of feedback, type of response data (e.g., predictive), 
and mode and method of response (e.g., drawings, verbal). 


(S/NF) Concurrently, the only known way to 
resolve the above issues is to conduct a large number of trials 
for a given individual with as many of the potential variables as 
possible held constant. Standard statistical methods can then be 
used to identify trends, patterns, and operational constraints. 


e. (U) DATA ANALYSIS 


(U) This area requires extensive review of 
leading analysis tools, such as those required for describing 
imprecise concepts or data (i.e., artificial intelligence 
techniques, fuzzy sets). This work will be combined with 
findings from neural network analysis and research, or possibly 
combinations of other emerging advanced analysis methods. 


(S/NF) Various approaches that are anticipated to 
directly benefit operational evaluations. One promising 
technique involves procedures based on an adaptive (frequent data 
base update) approach. This will permit an individual's 
progression, and possibly time dependent data variables in an 
individual's track record, to be identified. 


(S/NF) In addition to the search for new analysis 
methods, the current methods will also be reexamined. Laboratory 
requirements differ from those for operational activities in that 
the target can be controlled and well defined. For operational 
activities, uncertainties in tasking may arise, especially if 
operational requirements are changing or if some of the initial 
"known" data are incorrect. Such uncertainties complicate later 
analyses. 


(S/NF) Analysis methods will also be developed 
that can make predictions on data quality for any given task. 
This will require development of an extensive track record for 
each individual based on both controlled and operational 
projects. 
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(S/NF) These analysis methods will also address 
certain practical issues. For example, a detailed, high-quality 
example of AC data may have little value to an intelligence 
analyst if that information was known from other sources. 
Likewise, a poor example of AC data might provide a single 
element as a tip-off for other assets, or provide the missing 
piece in a complex analysis, and thus be quite valuable. The 
intelligence utility of AC data may in some cases be only weakly 
connected to the AC quality. Therefore a data fusion analysis 
procedure is needed for AC-derived operational data. Methods 
that permit appropriate data analysis from an accuracy and 
utility viewpoint will be developed. 


f. (U) INTEGRATION 


(U) This activity would be an on-going review/ 
integration effort in order to identify patterns or clues useful 
for understanding practical aspects of this phenomenological 
area. 


(S/NF) Identifying approaches and procedures that 
permit assimilation of AC data from operational support projects 
into all-source intelligence analysis procedures will also be 
part of this support activity. Depending on results of applied 
research findings and operational pursuits, a basic seminar/ 
training program for other applications-oriented elements might 
be established. Such a training/seminar program would focus on 
basic techniques and would augment possible operational training 
activity that might become part of the in-house effort. This 
would require several years to develop and establish. 


(S/NF) The specific experiments to be conducted 
in these research domains will be defined during the first six to 
nine months of the program utilizing the recommendations of the SG1B 
working groups mentioned above. 
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IX.  (U) POTENTIAL RESEARCH RETURN: 


(S/NF/SG/LIMDIS) The in-house and external research 
pursuits identified in this overall research and peer review plan 
have the potential for achieving highly significant results using 
AMP to address problems of national security by pushing the 
phenomena to their natural limits. This overall result will be 
achieved by: 


- Determining the underlying physical 
mechanisms of AMP. — 


- Isolating specific brain processes 
involved in the phenomenon. 


- Identifying. unique applications 
involving energetics" phenomenon (e€.g., 
remote switching). 
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(S/NF/SG/LIMDIS) It is the intention of STAR GATE to 
pursue all aspects of this area with high intensity, drawing on 
an experienced and well-qualified staff along with appropriate 
external assistance, in order to quantify and evaluate all 
available classified and unclassified research. By so doing, 
discoveries into how these phenomena work may be achievable. How 
to identify people with such talent (or potential for it) and how 
to develop/train selected individuals should also be a natural 
end-result. STAR GATE also draw heavily from lessons learned in 
all previous research and application investigations on a 
worldwide basis. 


X. (U) PROJECT OVERSIGH ODOLOGY: 
A. (U) PROGRAM MANAGEMENT/OVERSIGHT 


(S/NF) DIA, as executive agent, has implemented a 
management structure that fosters a proactive, responsive, and. 
creative environment for this activity. Both external research 
and in-house activities are centered in one unit (PAG-TA) under 
the direct supervision of the Director, Office for Ground Forces 
(DIA/PAG) . 


(S/NF) Project oversight for this program will be 
provided by a Project Review Board (PRB) composed of five senior 
management individuals selected from areas of DIA outside of the 
National Military Intelligence Production Center (NMIPC). In 
addition, a six-member Project Oversight Panel will be 
established to provide program and technical guidance on all STAR 
GATE activities. The 28 member DIA Advisory Board has been 
appraised of the STAR GATE program and their recommendations have 
been incorporated into project activities. Review/guidance is 
available from DIA's Executive Director and from the Deputy 
Director. The General Defense Intelligence Program (GDIP) staff 
director conducts periodic project reviews and provides guidance. 
Links with the Intelligence Community help provide a broader 
management and program review base for this activity. 


(U) The extensive nature and scope of these various 
program management and oversight activities will insure that all 
activities identified in this long-range plan can be 
appropriately monitored and evaluated on an on-going basis. 


B. SCIENTIFIC OVERSIGHT 


(S/NF) Oversight for external contract activity is 
currently provided by a six-member expert Scientific Oversight 
Committee (SOC). A Human Use Review Board has also been 
established to provide expert guidance/advice regarding 
contractor adherence to appropriate DOD human use regulation. 


(U) There is currently in place a contractor 
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Scientific Oversight Committee (SOC) which is tasked with three 
major responsibilities: 


a. Review and approve all experimental protocols 
prior to the collection of experimental data. 


b. Critically review all experimental final 
reports as if they were submissions to technical scientific 
journals. All remarks in writing are included in the final 
technical reports to DIA. 


c. Suggest directions for further research. 


(U) In addition to these responsibilities, the Soc 
members are encouraged to exercise un-announced drop-in 
privileges to view experiments in progress. 


(U) The five voting members of the SOC are respected 
scientists from the following disciplines: physics, astronomy, 
statistics, neuroscience, and psychology. See Appendix E for 
membership data. 


(U) A contractor Institutional Review Board (IRB) is 
currently in place with the responsibility of assuring compliance 
with all U.S. and DoD regulations with regard to the use of 
humans in experimentation and assuring their safety. The IRB 
members represent the health, legal, and spiritual professions in 
accordance with government guidelines. See Appendix F for 
membership data. 


(U) It is anticipated that oversight of this program 
will be conducted by these Committees, if available, or new 
committees with equivalent scientific credentials. 


XI. (U) DEVELOPMENT OF EVALUATION CRITERIA: 
A. (U) SCIENTIFIC VALIDITY 


(S/NF) The STAR GATE Scientific Advisory Committee has 
determined that the scientific validity of the STAR GATE program 
has been satisfactorily demonstrated under the most demanding of 
experimental protocols. An statistically significant anomaly 
does exist which cannot be currently explained by conventional 
means. For example, 77% of academics in the arts, humanities, 
and education believe that AMP is either an established fact or a 
likely possibility. Supporting technical evidence contained in 
technical studies may be found at Appendix G. 


(S/NF) A substantial number of examples dating back to 
1972 provide at a minimum prima facia evidence that AMP can be 
used in such a way as to provide a "value-added" function to the 
Intelligence Community. Appendix H is a formal evaluation of the 
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use of AMP for intelligence gathering purposes conducted in 1987. 
The overall findings of this evaluation were that "...the Project 
Review Group has determined to its satisfaction that the work of 
the Enhanced Human Performance Group is scientifically 
sound...and is providing valuable insight into the nature of an 
anomaly which have a significant impact on the DoD." 


B. (U) PERFORMANCE 


(S/NF) The ability of the STAR GATE program to produce 
results that have an intelligence value can only be measured by 
customer evaluations. AMP provided intelligence data, along with 
other forms of intelligence, are evaluated, in part, with 
subjective criteria. STAR GATE will develop feedback mechanisms 
and procedures for customers that will result in a method of 
quantifying this subjective feedback and evaluation data so that 
the value added and cost-effectiveness can be measured. 


XII. (U) BUDGET AN ESOURC EQUIREMEN 
(FYs 95-99): 


(S/NF/SG/LIMDIS) Due to the diversity of the STAR GATE 
mission/objectives, both external resources and in-house 
expertise are required. Since this Activity possesses no in- 
house R&D capability, an absolute need for external R&D support 
is required to meet Congressional concerns which are addressed in 
this program plan. A balance will be maintained between external 
and in-house activities, and every effort will be made to 
integrate and link these activities where appropriate. The 
external aspect permits a wide range of expertise covering many 
disciplines to be focused on this area; this also has the benefit 
of ensuring peer group review and of facilitating a variety of 
scientific interactions. In-house personnel with a wide-range of 
expertise in this phenemenology will need to be retained to make 
this proposed plan work. 


(S/NF) In order to review the major tenets of the draft 
program plan, the Defense Intelligence Agency will convene a 
panel of appropriate scientists to provide recommendations on the 
plan and the research it achieves. Based on the panel's 
recommendations, the Defense Intelligence Agency will then submit 
a budget line item to fund those approved objectives. 


(C) An annual report will document the current 
operational, technical and administrative status of the program. 
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APPENDIX A 
CONGRESSTIONALLY-DIRECTED ACTION 


DEFENSE AUTHOR (0) N 


(S/NF) REQUEST: "The conferees are concerned that insufficient 
funds have been spent on research and development to establish 
the scientific basis for the STAR GATE program. The conferees 
direct the Director of DIA to prepare a program plan and to 
submit an appropriate budget request for a research effort, over 
several years, to determine whether the STAR GATE program can 
show results that are cost-effective and satisfy reasonable 
performance criteria. This plan, and any research under this 
program, should be subject to peer review by neutral scientific 
experts. The Director of DIA is directed to prepare this 
research and peer review plan within existing program funds." 
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APPENDIX B 


TERMINOLOGY AND DEFINITIONS 


(U) PHENOMENA TERMINOLOGY: 


(U) This phenomenological area has had a variety of 
descriptive terms over the years, such as paranormal, 
parapsychological, or as psychical research. Foreign researchers 
use other terms: "“psychoenergetics" in the USSR; “extraordinary 
human function" in the People's Republic of China (PRC). In 
general, this field is concerned with a largely unexplored area 
of human consciousness/subconsciousness interactions associated 
with unusual or underdeveloped human capabilities. 


(U) Recently, researchers have shown a preference for terms 
that are neutral and that emphasizes the anomalous or enigmatic 
nature of this phenomena. The term anomalous mental phenomena 
(AMP), is generally preferred. 


(U) This area has two aspects; information access and 
energetics influence. Information access refers to a mental 
ability to describe remote areas or to access concealed data that 
are otherwise shielded from all known sensory channels. A recent 
term for this ability is anomalous cognition (AC). This term 
places emphasis on potential understanding that might be 
available from advances in sensory/brain functioning research or 
other related research. Older terms for this aspect have 
included extra-sensory perception (ESP), remote viewing (RV), and 
in some cases, precognition. 


(U) The energetics aspect refers to the ability to 
influence, via mental volition, physical or biological systems by 
an as yet unknown physical mechanism. An example of physical 
system influence would include affecting the output of sensors or 
electronic devices; biological systems influence would include 
affecting physiological parameters of an individual. A recent 
descriptive term for this ability is anomalous perturbation (AP). 
Older terms for this phenomenon included psychokinesis (PK) or 
telekinesis. 


(U) GENERAL DEFINITIONS: 


(S/NF) For this program, basic research is defined to mean 
any investigation or experiment for determining fundamental 
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processes or for uncovering underlying parameters that are 
involved in this phenomenon. Basic research is primarily 
oriented toward understanding the physical, physiological , and 
psychological mechanisms of anomalous mental phenomena (AMP). 


(S/NF) Applied research refers to any investigation 
directed toward developing particular applications or for 
improving data quality and reliability. For anomalous cognition 
(AC) phenomenon, research is primarily directed toward improving 
the output quality of AC data. This would include ways to 
develop/improve utility of AC data for variety of potential 
application. For example, examination of spatial and temporal 
relationships of AC data could assist in developing a reliable 
search capability useful for locating missing people or 
equipment. 
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POTENTIAL RESEARCH SUPPORT FACILITIES 


ANOMALOUS MENTAL PHENOMENA 


Science Applications International Corp. Los Altos, CA 
7 Mind Science Foundation San Antonio, TX 
Princeton Engineering Anomalies Laboratory Princeton Univ, NJ 
American Society for Psychical Research New York, NY 
St. John's University Long Island, NY 
Foundation for Research into the Nature Durham, NC 
of Man 
ARE/Atlantic University Virginia Beach, VA 
University of Virginia Charlottesville, 
VA 
Psychophysical Research Laboratories Edinburgh, 
Scotland 
Edinburgh University Edinburgh, 
# . Scotland 
OTHER RELATED DISCIPLINES 
Psychology 
Stanford University Stanford, CA 
Cornell University Ithaca, NY 
Anthropology 
University of California Berkeley, CA 
University of Arizona Tucson, AZ 
Psychophysiology 
- SRI International Menlo Park, CA 
Langly-Portor Neuropsychiatric Institute San Francisco, CA 
Menninger Foundation Topeka, KS 
Psychoimmunology 
California Institute for Transpersonal Menlo Park, CA 
Psychology 
Cognitive Neuroscience 
Los Alamos National Laboratory Los Alamos, NM 
Sandia National Laboratory Albuquerque, NM 
: University of California San Diego, CA 
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Cognitive Psychology 
Psychology Department, Princeton Univ Princeton, NJ 
Psychology Department, City College of New York, NY 
New York 
Artificial Intelligence 
Massachusetts Institute of Technology Cambridge, MA 
m Stanford University Stanford, CA 
Neural Networks 
Massachusetts Institute of Technology Cambridge, MA 
Science Applications International Corp Los Altos, CA 
Statistics/Signal Analysis 
University of California Davis, CA 
Harvard University Cambridge, MA 
Thermodynamics 
Rochester University Rochester, NY 
# Physics Department, Stanford University Stanford, CA 


Quantum Measurement 
International Business Machines, College Park, MD 
Research Laboratories 


General Relativity 
California Institute of Technology Pasadena, CA 
University of Texas at Austin Austin, TX 


Electromagnetic/Basic Research 


Electronetics Corp Buffalo, NY 
. Battelle Corp Columbus, OH 

Institute for Advanced Study Austin, TX 
a 
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APPENDIX D 


RESOURCE LITERATURE 


1. A.R.E. Journal 
2. Abnormal hypnotic Phenomena 
3. American Anthropologist 
f 4. American Ethnologist 
5. American Journal of Clinical Hypnosis 
6. American Journal of Physiology 
7. American Journal of Sociology 
8. American Psychologist 
9. American Society for Psychical Research 
10. Annals of Eugenics 
11. Annals of Mathematical Statistics 
12. Annales de Sciences Psychiques 
13. Archivo di Psicologica Neurologia e Psychiatra 
14. Association for the Anthropological Study of Consciousness 
Newsletter 
, 15. Behavioral and Brain Science 
16. Behavioral Science 
17. Bell System Technical Journal 
18. Biological Psychiatry 
19. Biological Review 
20. British Journal for the Philosophy of Science 
21. British Journal of Psychology 
22. Bulletin of the American Physical Research 
23. Bulletin of the Boston Society for Psychic Research 
24. Bulletin of the Los Angeles Neurological Societies 
25. Contributions to Asian Studies 
26. Electroencephalography and Clinical Neurophysiology 
s 27. Endeavour 
28. Ethnology 
29. Exceptional Human Experience 
30. Experientia 
31. Experimental Medicine and Surgery 
32. Fate 
33. Fields within Fields 
34. Foundations of Physics 
35. Hibbert Journal 
36. Human Biology 
37. International Journal of Clinical and Experimental Hypnosis 
38. International Journal of Comparative Sociology 
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International Journal of Neuropsychiatry 
International Journal of Parapsychology 
International Journal of Psychoanalysis 


Journal 
Journal 
Journal 
Journal 
Journal 
Journal 
Journal 
Journal 
Journal 
gournal 
Journal 
Journal 
Journal 
Journal 
Journal 
Journal 
Journal 
Jgournal 
Journal 
Journal 
Journal 


Journal — 


Journal 
Journal 
Britain 


Nature 


of 


Abnormal and Social Psychology 
Altered States of Consciousness 
Applied Physics 

Applied Psychology 

Asian and African Studies 
Biophysical and Biochemical Cytology 
Cell Biology 

communication 

Comparative and Physiological Psychology 
Consulting Psychology 

Existential Psychiatry 

Experimental Biology 

Experimental Psychology 

General Psychology 

Genetic Psychology 

Mind and Behavior 

Nervous and Mental Diseases 
Personality 

Personality and Social Psychology 
Research in PSI Phenomena 

Scientific Exploration 

the American Academy of Psychoanalysis 
the London Mathematical Society 


the Royal Anthropological Institute of Great 


and Ireland 
Metapsichica 
Mind-Brain Bulletin 
Motivation and Emotion 


Naturwissenschaftliche Rundschau 

New Horizons 

New Scientist 

New Sense bulletin 

Newsletter of the Parapsychology Foundation 
Parapsychology Bulletin 

Parapsychology Abstracts International 
Parapsychology Review 

Perceptual and Motor Skills 

Philosophy of Science 

Physiology and Behavior 

Proceedings of the Society for Psychical Research 
Psychedelic Review 


Psychic 
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84. Psychic Science 
85. Psychoanalytic Quarterly 
86. Psychoanalytic Review 
87. Psychological Bulletin 
88. Psychometrika 
89. Psychophysiology 
90. Physics Today 
7 91. Renti Teyigongneng (EFHB Research) [PRC] 
92. Revue Metapsychique 
93. Revue Philosophique 
94. Revue Philosophique de la France et de L'Etranger 
95. Revue Philosophique Applique 
96. Science 
97. Skeptical Inquirer 
98. Social Studies of Science 
99. Subtle Energies 
100. The Humanistic Psychology Institute 
101. The Journal of Parapsychology 
102. The Journal of the American Society for Psychical Research 
* , 103. Theta 
104. Tijdschrif voor Parapsychologie 
105. Tomorrow 
106. Voprosy Filosofi (Questions of Philosophy) {RUSSIA} 
107. Western Canadian Journal of Anthropology 
108. Zeitschrift fur die Gesamte Neurologie und Psychiatrie 
109. Zietschrift fur Parapsychologie und Grenzgebeite der 
Psychologie 
110. Zietschrift fur Tierpsychologie 
111. Zietschrift fur Vergleichende Physiologie 
112. Zetetic Scholar 
113. Zhongguo Shebui Kexue (China Social Sciences) [PRC] 
- 114. Ziran Zazhi (Nature) [PRC] 
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APPENDIX E 
CURRENT CONTRACTOR SCIENTIFIC OVERSIGHT COMMITTEE MEMBERSHIP 


Steven A. Hillyard 
- Professor of Neurosciences, Department of Neurosciences, 
University of California, San Diego. 
_ ~ Author or coauthor of 118 technical neuroscience 
publications. 
- Eighty-two invited presentations at technical conferences. 
- Ph.D., Yale University, 1968 (Psychology) - 


S. James Press 
- Professor of Statistics, Department of Statistics, University 
of California, Riverside. 
- Author or coauthor of 132 statistics publications. 
- Author of 12 books and/or monographs. 
- Ph.D., Stanford University, 1964 (Statistics). 


cs Garrison Rapmund 
- Responsible for facilitating transfer of Strategic 
Defense Initiative technologies to health care industries. 
- Major General, USA retired in 1986 as Assistant Surgeon 
General (R&D) and Commander, Army Medical R & D Command. 
- M.D., Columbia University, 1953 (Pediatrics). 


Melvin Schwartz 
- Associate Director for High Energy and Nuclear Physics, 
Brookhaven National Laboratory. 
Author or coauthor of 40 technical publications in high energy 
physics, author of "Principles of Electrodynamics." 
, - Nobel Prize, Physics (1988). 
- Ph.D., Columbia University, 1958 (Physics). 


Yervant Terzian 
- Professor of Physical Sciences, Chairman of the Department of 
Astronomy, Cornell University. 
- Author/coauthor of numerous technical publications and books. 
- Ph.D., Indiana University, 1965 (Astronomy) . 


Phillip G. Zimbardo 
- Professor of Psychology, Department of Psychology, Stanford 
University. 
4 - Author/coauthor of numerous experimental psychology 
publications. 
- Ph.D., Yale University, 1959 (Psychology). 
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CURRENT CONTRACTOR INSTITUTIONAL REVIEW BOARD MEMBERSHIP 


CURRENT CONTRALCLIUN tNots 6 8 eee eeeeeeeeeeeee——e—eeeESEOe 


Byron Wm. Brown, Jr., Ph.D. 
- Biostatistics, Stanford University 


Gary R. Fujimoto, M. D. 


v - Occupational Medicine, Palo Alto Medical Foundation 


John Hanley, M. D. 
- Neuropsychiatry, University of California, Los Angeles 


Robert B. Livingston, M. D. 
- Neuroscience, University of California, San Diego 


Robin P. Michelson, M. D. 
- Otolaryngology, University of California, San Francisco 


Ronald Y¥. Nakasone, Ph.D. 
’ -~ Buddhist Studies, Institute of Buddhist Studies, Berkeley, CA 


Garrison Rapmund, M. D. (Chair) 
- Air Force Science Advisory Board 


Louis Je West, M. D. : 
- Neuropsychiatry, University of California, Los Angeles 
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ACADEMIC STUDIES REGARDING THE SCIENTIFIC VALIDITY OF AMP 
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Feychological Bulletin (January, 1994) 


Version 4.7 
October 1, 1993 


Does Psi Exist? 
Replicable Evidence for an 
Anomalous Process of Information Transfer 


Daryl.J. Bem and Charles Honorton 


Most academic psychologists do not yet accept the existence of psi, anomalous processes of in- 
‘formation or energy transfer (such. as telepathy or other forms of extrasensory perception) that 
ere currently unexplained in terms of known physical or biological. mechanisms. We believe 
that the replication rates and effect sizes achieved by one particular experimental method, the 
ganzfeld procedure, are now sufficient to warrant bringing thie. body of data to the:attention of 
the wider psychological community. Competing meta-analyses of the ganzfeld database are re- 
viewed, 1 by R. Hyman (1985), a skeptical critic of psi research, and the other by C. Honorton 
(1985), a parapsychologiat and major contributor to the ganzfeld-database. Next the resulta of 
11 new ganzfeld studies that:comply with guidelines jointly authored by R. Hyman and C., 
Honorton (1986) are summarized. Finally, issues of replication and theoretical explanation are 


discussed. 


The term pei denotes anomalous processes of informa- 
tion or energy transfer, processes such as telepathy or 
other forms of extrasensory perception that are:currently 
unexplained in terms of known physical or biological 
mechanisms. The term is purely descriptive: It. neither 
implies that such enomalous phenomena ere paranormal 
nor connotes anything about their underlying mecha- 
nisms. 

Does psi exist? Most academic paychologists don't think 
so. A survey of more than 1,100 college professors in the 
United States found that 55% of natural scientists, 66% of 
social scientists (excluding psychologists), and 77% of aca- 
demics in the arts, humanities, and education believed 
that ESP is either an established fact.or a likely possibil- 
ity. The comparable figure for psychologists was only 34%. 
Moreover, an equal number of paychologists declared ESP 
to be an impossibility, a view expressed by only 2% of all 
other respondents (Wagner & Monnet, 1979). 


Daryl J. ‘Bem, Department.of Psychology, Cornell University; . 


Charles Honorton, Department of Psychology, University of Ed- 
inburgh, Edninburgh, Scotland. . 

Sadly, Charles Honorton died of a heart attack on November 4, 
1992, 9 days before this article was accepted for publication. He 
waa 46. Parapsychology has lost one of its most valued contribu- 
tors, I have lost'a valued friend.» 

This collaboration had its origins in # 1983 visit I made to 
Honorton’s Psychophysical Research Laboratories (PRL) in 
Princeton, New Jersey, as one of several outside consultants 
brought in to examine the design and implementation of the ex- 
perimental protocols. ; 

Preperation of this article was supported, in part, by grants to 
Charles Honorton from the American Society for Psychical Re- 
search and the Parapsychology Foundation, both of New York 
City. The work at PRL summarized in the second half of this ar- 
ticle was supported by the James S. McDonnell Foundation of St. 
Louis, Missouri, and.by the John E. Fetzer Foundation of Kala- 
mazoo, 

Helpful comments on drafts of this article were received from 
Deborah Delanoy, Edwin May, Donald McCarthy, Robert Morris, 
John Palmer, Robert Rosenthal, Lee Ross, Jessica Utts, Philip 
Zimbardo, and two anonymous reviewers. 

Correspondence concerning this article should be addressed to 
Daryl J. Bem, Department of Psychology, Uris Hall, Cornell 


University, Ithaca, New York 14853. (Electronic mail may be . 


sent to d.bem@cornell.edu) 


' Paychologiats are probably more skeptical about psi for 
several reasons. First, we believe that extraordinary 
claims require extraordinary proof. And although our col- 
leagues from other disciplines would probably agree with 
this dictum, we are more likely to be familiar with the 
methodological and statistical requirements for sustaining 
such claims, as well as with previous claims that failed ei- 
ther to meet those requirements or to survive the test of 
successful replication. Even for ordinary claims, our con- 
ventional statistical criteria are conservative. The sacred 
p = .05 threshold is a constant reminder that it is far more 
sinful to assert that an effect exists when it does not (the 

I error) than to. assert that an effect does not exist 


Type le 
when it does (the Type I error). . 


Second, most of us distinguish sharply between phe- 
nomena whose explanations are merely obscure or contro- 
versial (e.g., hypnosis) and phenomena euch as psi that 
would appear to fall outside our current explanatory 
framework altogether. (Some would characterize this as 
the difference between the unexplained and the inexplica- 
ble.) In contrast, many laypersons treat all exotic psycho- 
logical phenomena as epistemologically equivalent; many 
even consider déja vu to be a psychic phenomenon. The 
blurring of this critical distinction is aided and abetted by 
the mass media, “new age” books and mind-power courses, 
and “psychic” entertainers who present both genuine hyp- 
nosis and fake “mind reading” in the course of a single 
performance. Accordingly, most laypersons would not 
have to revise their conceptual model of reality as radi- 
cally as we would to assimilate the existence of psi. For 
us, psi is simply more extraordinary. 

Finally, research in cognitive and eocial psychology has 
sensitized us to the errors and biases that plague intuitive 
attempts to draw valid inferences from the data of every- 
day experience (Gilovich, 1991; Nisbett & Ross, 1980; 
Tversky -& Kahneman, 1971). This leads us to give virtu- 
ally no probative weight to anecdotal or journalistic re- 
ports of psi, the main source cited by our academic col- 
leagues as evidence for their beliefs about psi (Wagner & 
Monnet, 1979), 

Ironically, however, psychologists are probably not more 
familiar than others with recent experimental research on 
psi. Like most psychological research, parapsychological 
research is reported primarily in specialized journals; un- 
like most psychological research, however, contemporary 


' parapsychological research is not usually reviewed or 
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ANOMALOUS INFORMATION TRANSFER 


Meta-Analyses of the Ganzfeld Database 


In 1985 and 1986, the Journal of Parapsychology de- 
voted two entire issues to a critical examination of the 
ganzfeld database. The 1985 issue comprised two contri- 
butions: (a) a meta-analysis and critique by Ray Hyman 
(1985), a cognitive psychologist and skeptical critic of 
parapsychological research, and (b) a competing meta- 
analysis and rejoinder by Charles Honorton (1985), a 
parapsychologist and major contributor to the ganzfeld 
database. The 1986 issue contained four commentaries on 
the Hyman-Honorton exchange, a joint communiqué by 
Hymen and Honorton, and six additional commentaries 
on the joint communiqué itself. We summarize the major 
isaues and conclusions here. 


Replication Rates’ 


Rates by study. Hyman’s meta-analysis covered 42 pai 
ganzfeld studies reported in 34 separate reports written 
or published from 1974 through 1981. One of the first 
problems he discovered in the database was multiple 
analysis. As noted earlier, it is posaible to calculate sev- 
eral indexes of psi performance in a ganzfeld experiment 
and, furthermore, to subject those indexes to several kinds 
of statistical treatment. Many investigators reported mul- 
tiple indexes or applied multiple statistical tests without 
adjusting the criterion significance level for the number of 
tests conducted. Worse, some may have “shopped” among 
the alternatives until finding one that yielded a signifi- 
cantly successful outcome. Honorton agreed that this was 
a problem. , 

Accordingly, Honorton applied a uniform test on a 
common index across all studies from which the pertinent 
datum could be extracted, regardless of how the investiga- 
tors had analyzed the data in the original reports. He se- 
lected the proportion of hits as the common index because 
it could be calculated for the largest subset of studies: 28 


of the 42 studies. The hit rate is also a conservative index . 


because it discards most of the rating information; a sec- 
ond place ranking—a “near miss"—receives no more 
credit than a last place ranking. Honorton then calculated 
the exact binomial probability and its associated z score 
for each study. 

Of the 28 studies, 23 (82%) had positive z scores (p = 
4.6 x 10~4, exact binomial test with P=qe 5). Twelve of 
the studies (43%) had z acores that were independently 
significant at the 5% level (p = 3.5 x 10-9, binomial test 
with 28 studies, p = .05, and q = .95), and 7 of the studies 
(25%) were independently significant at the 1% level (p = 
9.8.x 10~®). The composite Stouffer z score across the 28 
studies was 6.60 (p = 2.1 x 10714),1 A more conservative 
estimate of significance can be obtained by including 10 
additional studies that also used the relevant judging pro- 
cedure but did not report hit rates. If these studies are as- 
signed a mean z score of zero, the Stouffer z across all 38 
studies becomes 5.67 (p = 7.3 x 107). 

Thus, whether one considers only the studies for which 
the relevant information is available or includes a null es- 
timate for the additional studies for which the information 
is not available, the aggregate results cannot reasonably 


1Stouffer’s z is computed by dividing the sum of the z scores for 
the individual studies by the equare root of the number of studies 
(Rosenthal, 1978). 


be attributed to chance. And, by design, the cumulative 
outcome reported here cannot be attributed to the infla- 
tion of significance levels through multiple analysis. 

Rates by laboratory. One objection to estimates such as 
those just described is that studies from a common labora- 
tory are not independent of one another (Parker, 1978). 
Thus, it is possible for one or two investigators to be dis- 
proportionately responsible for a high replication rate 
whereas other, independent investigators are unable to 
obtain the effect. 

The ganzfeld database is vulnerable to this possibility. 
The 28 etudies providing hit rate information were con- 
ducted by investigators in 10 different laboratories. One 
laboratory contributed 9 of the studies, Honorton's own 
laboratory contributed 6, 2 other laboratories contributed 
3 each, 2 contributed 2 each, and the remaining 4 labora- 
tories:each contributed 1. Thus, half of the studies were 
conducted by only 2 laboratories, 1 of them Honorton’s 
own. | 
Accordingly, Honorton calculated a separate Stouffer z 
score for each laboratory. Significantly positive outcomes 
were reported by 6 of the 10 laboratories, and the com- 
bined z score across laboratories was 6.16 (p = 3.6 x 
10-19), Even if all of the studies conducted by the 2 most 
prolific laboratories are discarded from the analysis, the 
Stouffer z across the 8 other laboratories remains signifi- 
cant (z = 3.67, p = 1.2 x 10~4). Four of these studies are 
significant at the 1% level (p = 9.2 x 10-*, binomial test 
with 14 studies, p = .01, and g = .99), and each was con- 
tributed by a different laboratory. Thus, even though the 
total number of laboratories in this database is small, 
most of them have reported significant studies, and the 
significance of the overall effect does not depend on just 
one or two of them. 


Selective Reporting 


In recent years, behavioral scientists have become in- 
creasingly aware of the “file-drawer” problem: the likeli- 
hood that successful studies are more likely to be pub- 
lished than unsucceasful studies, which are more likely to 
be consigned to the file drawers of their disappointed in- 
vestigators (Bozarth & Roberts, 1972; Sterling, 1959). 
Parapasychologists were among the first to become sensi- 
tive to the problem, and, in 1975, the Parapsychological 
Association Council adopted a policy opposing the selec- 
tive reporting of positive outcomes. As a consequence, 
negative findings have been routinely reported at the as- 
sociation’s meetings and in its affiliated publications for 
almost two decades. As has already been shown, more 
than half of the ganzfeld studies included in the meta- 
analysis yielded outcomes whose significance falls short of 
the conventional .05 level. 

A variant of the selective reporting problem arises from 
what Hyrhan (1985) has termed the “retrospective study.” 
An investigator conducts a small set of exploratory trials. 
If they yield null results, they remain exploratory and 
never become part of the official record; if they yield posi- 
tive results, they are defined as a study after the fact and 
are submitted for publication. In support of this possibil- 
ity, Hyman noted that there are more significant studies 
in the database with fewer than 20 trials than one would 
expect under the assumption that, all other things being 
équal, statistical power should increase with the square 
root of the sample size. Although Honorton questioned the 
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In psi ganzfeld studies, the hit rate itself provides a 
straightforward descriptive measure of effect size, but this 
measure cannot be compared directly across studies be- 
cause they do not all use e four-stimulus judging set and, 
hence, do not all have a chance baseline of .25. The next 
most obvious candidate, the difference in each study be- 
tween the hit rate observed and the hit rate expected un- 
der the null hypothesia, is aleo intuitively descriptive but 
is not appropriate for statistical analysis because not all 
differences between proportions that are equal are equally 
detectable (e.g., the power to detect the difference between 
-55 and .25 is different from the power to detect the differ- 
ence between .50.and .20). 

To provide a scale of equal detectability, Cohen (1988) 
devised the effect size index h, which involves an arcaine 
transformation on the proportions before calculation. of 
their difference. Cohen's k is quite general and can assess 
the difference between any two proportions drawn from 
independent samples or between a single proportion and 
any specified hypothetical value. For the 28 studies exam- 
ined in the meta-analyses, A was .28, with a 95% contfi- 
dence interval from .11 to .45. 

But because values of A do not provide an intuitively 
descriptive acale, Rosenthal and Rubin (1989; Rosenthal, 
1991) have recently suggested a new index, x, which ap- 
plies specifically to one-sample, multiple-choice data of 
the kind obtained in ganzfeld experiments. In particular, 
m expresses all hit rates as the proportion of hits that 
would have been obtained if there had been only two 
equally likely alternatives—essentially a coin flip. Thus, x 
ranges from 0 to 1, with .5 expected under the null hy- 
pothesis. The formula is 


_ Pk-) : 
P(k-2) + 1 


where P is the raw proportion of hits and k is the number 
of alternative choices available. Because x has such.a 
straightforward intuitive interpretation, we use it (or its 
conversion back to an equivalent four-alternative hit rate) 
throughout thie article whenever it is applicable. 

For the 28 studies examined in the meta-analyses, the 
mean value of 2 was .62, with a 95% confidence interval 
from .55 to .69. This corresponds to a four-alternative hit 
rate of 35%, with a 95% confidence interval from 28% to 
43%, 

Cohen (1988, 1992) has alao categorized effect sizes into 
small, medium, and large, with medium denoting an effect 
size that should be apparent to the naked eye of a careful 
observer. For a statistic such as x, which indexes the de- 
viation of a proportion from .5, Cohen considers .65 to be a 
medium effect size: A statistically unaided observer 
should be able to detect the bias of a coin that comes up 
heads on 65% of the trials. Thus, at -62, the psi ganzfeld 
effect size falle just short of Cohen's naked-eye criterion. 
From the phenomenology of the ganzfeld experimenter, 
the corresponding hit rate of 35% implies that he or she 
will see a subject obtain a hit approximately every third 
session rather than every fourth. 

It is also instructive to compare the psi ganzfeld effect 
with the results of a recent medical study that sought to 
determine whether aspirin can prevent heart attacks 
(Steering Committee of the Physicians’ Health Study Re- 
search Group, 1988). The study was discontinued after 6 


years because it was already clear that the aspirin treat. 
ment was effective (p < .00001) and it was considered un- 
ethical to keep the control group on placebo medication. 
The study was widely publicized as a major medica] 
breakthrough. But despite its undisputed reality and 
practical importance, the size of the aspirin effect is quite 
emall: Taking aspirin reduces the probability of suffering 
a heart attack by only .008. The corresponding effect size 
(A) is .068, about one third to one fourth the size of the psi 
ganzfeld effect (Atkinson et al., 1993, p. 236; Utts, 1991b). 

In sum, we believe that the psi ganzfeld effect is large 
enough to be of both theoretical interest and potential 
practical importance. 


Experimental Correlates of the Psi Ganzfeld Effect 


We showed earlier that the technique of correlating 
variables with effect sizes across studies can help to as- 
sess whether methodological flaws might have produced 
artifactual positive outcomes. The same technique can be 
used more affirmatively to explore whether an effect 
varies systematically with conceptually relevant varia- 
tions in experimental procedure. The discovery of such 
correlates can help to establish an effect as genuine, sug- 
gest ways of increasing replication rates and effect sizes, 
and enhance the chances of moving beyond the simple 
demonstration of an effect to its explanation. This strat- 
egy is only heuristic, however. Any correlates diacovered 
must be considered quite tentative, both because they 
emerge from post hoc exploration and because they neces- 


sarily involve comparisons across heterogeneous studies 


that differ simultaneously on many interrelated variables, 
known and unknown. Two such correlates emerged from 
the meta-analyses of the psi ganzfeld effect. 

Single- versus multiple-image targets. Although most of 
the 28 studies in the meta-analysis used single pictures as 
targets, 9 (conducted by three different investigators) 
used View Master stereoscopic slide reels that presented 
multiple images focused on a central theme. Studies using 
the View Master reels produced significantly higher hit 
rates than did studies using the single-image targets (50% 
vs. 34%), (26) = 2.22, p= .035, two-tailed. 

Sender-receiver pairing. In 17 of the 28 studies, partici- 
pants were free to bring in friends to serve as senders. In 
8 studies, only laboratory-assigned senders were used. 
(Three studies used no sender.) Unfortunately, there is no 
record of how many participants in the former studies ac- 
tually brought in friends. Nevertheless, those 17 studies 
(conducted by six different investigators) had significantly 
higher hit rates than did the etudies that used only labo- 
ratory-assigned senders (44% vs. 26%), t(23) = 2.39, pe 
-025, two-tailed. 


The Joint Communiqué 


After their published exchange in 1985, Hyman and 
Honorton agreed to contribute a joint communiqué to the 
subsequent discussion that was published in 1986. First 
they set forth their areas of agreement and disagreement: 


We agree that there is an overall significant effect in this 
data base that cannot reasonably be explained by selective 
reporting or multiple analysis. We continue to differ over 
the degree to which the effect constitutes evidence for psi, 
but we agree that the final verdict awaits the outcome of fu- 
ture experiments conducted by a broader range of investiga- 
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Randomization. The random selection of the target and 
sequencing of the judging set were controlled by a noise- 
based random number generator interfaced to the com- 
puter. Extensive testing confirmed that the generator was 
providing a uniform distribution of values throughout the 
full target range (1-160). Tests on the actual frequencies 
observed during the experiments confirmed that targets 
were, on average, selected uniformly from among the 4 
clips within each target set and that the 4 judging se- 
quences used were uniformly distributed acroas sessions. 

Additional control features. The receiver's and sender's 
rooms were sound-isolated, electrically shielded chambers 
with single-door access that could be continuously moni- 
tored by the experimenter. There was two-way intercom 
communication between the experimenter and the re- 
ceiver but only one-way communication into the eender’s 
room; thus, neither the experimenter nor the receiver 
could monitor events inside the sender's room. The 
archival record for each session includes an audiotape 
containing the receiver’s mentation during the ganzfeld 
period and all verbal exchanges between the experimenter 
and the receiver throughout the experiment. 

The automated ganzfeld protocol has been examined by 
several dozen parapsychologists and behavioral re- 
searchers from other fields, including well-known critics 
of parapsychology. Many have participated as subjects or 
observers. All have expressed satisfaction with the han- 
dling of security iesues and controls. 

Parapsychologists have often been urged to employ ma- 
gicians as consultants to ensure that the experimental 
protocols are not vulnerable either to inadvertent sensory 
leakage or to deliberate cheating. Two “mentalista,” magi- 
cians who specialize in the simulation of ai, have exam- 
ined the autoganzfeld system and protocol. Ford Kross, a 
professional mentalist and officer of the mentaliat’s pro- 
fessional organization, the Psychic Entertainers Associa- 
tion, provided the following written statement “In my pro- 
fessional capacity as a mentalist, I have reviewed Psy- 
chophyeical Research Laboratories’ automated ganzfeld 


system and found it to provide:excellent:security against ' 


deception by subjects” (personal communication, May, 
1989). : 

Daryl J. Bem has aleo performed as a mentalist for 
many years and is a member of the Psychic Entertainers 
Association. As mentioned in the author note, this article 
had its origins in a 1983 visit he made to Honorton's labo- 
ratory, where he was asked to critically examine the re- 
search protocol from the perspective of a mentalist, a re- 
search psychologist, and a subject. Needless to say, this 
article would not exist if he did not concur with Ford 
Krose’s assessment of the security procedures. 


Experimental Studies5 


Altogether, 100 men and 140 women participated as re- 
ceivers in 354 sessions during the research . The 
participants ranged in age from 17 to 74 years (m = 37.8, 
SD = 11.8), with a mean formal education of 15.6 years 
(SD = 2.0). Eight separate experimenters, including Hon- 
orton, conducted the studies. 


—_-—— eee 


5A recent review of the original computer files uncovered a 
duplicate record in the autoganzfeld database. This has now been 
eliminated, reducing by one the number of subjects and sessions. 
As « result, come of the numbers presented in this article differ 
slightly from those in Honorton et al. (1990). 


The experimental program included three pilot and 
eight formal studies. Five of the formal studies used 
novice (first-time) participants who served as the receiver 
in one session each. The remaining three formal studies 
used experienced participants. 

Pilot studies. Sample sizes were not Preset in the three 
pilot studies. Study 1 comprised 22 sessions and was con- 
ducted during the initial development and testing of the 
autoganzfeld system. Study 2 comprised 9 sessions testing 
@ procedure in which the experimenter, rather than the 
receiver, eerved as the judge at the end of tho seasion. 
Study 3 comprised 35 sessions and served as practice for 
participants who had completed the allotted number of 
sessions in the ongoing formal studies but who wanted 
additional ganzfeld experience. This study also included 
several demonstration sesaions when TV film crewa were 
present, 

Novice Studies. Studies 101-104 were each designed to 
test 50 participants who had had no prior ganzfeld experi- 
ence; each participant served as the receiver in a single 
ganzfeld session. Study 104 included 16 of 20 students re- 
cruited from the Juilliard School in New York City to test 
an artistically gifted sample. Study 105 was initiated to 
accommodate the overflow of participants who had been 
recruited for Study 104, including the four remaining Juil- 
liard students. The sample size for this study was set to 
25, but only 6 sessions had been completed when the labo- 
ratory closed. For purposes of exposition, we divided the 
56 sessions from Studies 104 and 105 into two parts: 
Study 104/105(a) comprises the 36 non-Juilliard partici- 
pants and Study 104/105(b) comprises the 20 Juilliard 
students. 

Study 201. This study was designed to retest the most 
promising participants from the previous studies. The 
number of trials was set to 20, but only 7 sessions with 3 
participants had been completed when the laboratory 
closed, 


Study 301. This study was designed to compare static 
and dynamic targets. The sample eize was set to 50 ses- 
sions. Twenty-five experienced participants each served 
as the receiver in 2 sessions. Unknown to the participanta, 
the computer control program was modified to ensure that 
they would each have 1 session with a static target and 1 
session with a dynamic target. 

Study 302. Thies study was designed to examine a dy- 
namic target set that had yielded a particularly high hit 
rate in the previous studies. The study involved experi- 
enced participants who had had no prior experience with 
this particular target set and who were unaware that only 
one target set was being sampled. Each served as the re- 
ceiver in a single session. The design called for the study 
to continue until 15 seasions were completed with each of 
the targets, but only 25 sessions had been completed 
when the laboratory closed. 

The 11 studies just described comprise all sessions con- 
ducted duting the 6.5 years of the program. There is no 
“file drawer” of unreported sessions. 


Results 


Overall hit rate. As in the earlier meta-analysia, re- 
ceivers’ ratings were analyzed by tallying the proportion 
of hits achieved and calculating the exact binomial proba- 
bility for the observed number of hits compared with the 
chance expectation of .25 As noted earlier, 240 partici- 
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Table 2 


Study 302: Expected Hit Rate and Proportion of Sessions in.which Each Video Clip was Ranked First when it was a Target and 


when it was.a Decoy: 
Relative 


_ Relative Frequency of 


“Frequency ‘First Place " Expected. 
Hit Rate (%) 


Video Clip asTarget. Ranking 


Ranked First’ Ranked First 
: when +. °.' when 
“Decoy 


Difference 


Tidal Wave 28 ae 
Snakes: 
Sex Scene 


Bugs Bunny 


6.72 


57 


».-{144) 


82 
9/11) . 


x - _ - ‘ ‘ . . # . 


seesions of a study are more successful than later ses- 
sions. If there were such an effect, then studies with fewer 
sessions would show larger effect sizes. because they 
" would end before a decline could set in. To check this pos- 
sibility, we computed point-biserial correlations between 
hits (1) or misses (0) and the session number within each 
. of the 10 studies. All of the correlations hovered eround 
zero; six.were positive, four were negative, and the overall 
mean was .01. ore = : 
An inspection of Table 1-reveals that the negative corre- 
lation derives primarily from the two-studies with the 


largest effect sizes: the 20 sessions with the Juilliard stu- . 
dents and the 7 sessions of Study 201,:the etudy specifi: - 


cally ‘designed to retest the most .promising- participants 


from thé“previous studies. Accordingly, it-seems likely - 
thet the larger effect sizes’ of these two studies—and: 


hence the significant negative: correlation between the 
number of sessions: and the effect size—reflect genuine 
performance differences between these two small, highly 
selected samples and other autoganzfeld participants. 

Study 302. All of the studies except Study 302 randomly 
sampled from a pool of 160 static and dynamic targeta. 
Study 302 sampled from.a single, dynamic target eet that 
had yielded a particularly high hit rate in the previous 
studies. The four film clips in this set consisted of a scene 
of a tidal wave from the movie Clash of the Titans, a high- 
speed sex.scene from A Clockwork Orange, a scene of 
crawling snakes from a-TV documentary, and a ecene 
from a Bugs Bunny cartoon. 

The experimental design called for this study to con- 
tinue until each of the clips had served as the target 15 
times. Unfortunately, the premature termination of this 
study at 25 sessions left an imbalance in the frequency 
with which each clip had served as the target. This means 
that the high hit rate observed (64%).could well be in- 
flated by response biases. 


As an illustration, water imagery is frequently reported _ 


by receivers in.ganzfeld sessions whereas sexual imagery 
is rarely reported. (Some participants are probably reluc- 


tant both to report sexual imagery and to give the highest 
rating to the sex-related clip.) If a video clip containing 
popular imagery {euch as water) happens to appear as a 


. target more-frequently:than.a-clip containing“‘unpopular 


imagery (such as sex),:a high hit rate might simply reflect 
the coincidence of those frequencies of occurrence ‘with 
icipants’ response biases. And, as the second column 


. parti ; 
of Table 2 reveals, the tidal wave-clip did’in fact appear 


more frequently.as the target than did the'sex clip. More 
generally, the second.and third columns of Table 2'chow 
that the frequency with which each film clip was ranked 
first closely matches the frequency with which each ap- 
pearedasthetarget. = Re eet has 

One can adjust for this problem by using the observed 
frequencies ‘in these two columns to compute the hit rate 
expected if there were no psi effect. In particular, one can 
pepentlng peopiniiaa is Ge tint eee ce 
responding proportion.in the: ‘column—yielding the 
joint probability that the-clip was the target and that it 
was ranked first—and then-eum across the four clips. As 
shown in the fourth column of Table 2, this computation 
yields:an overall expected hit-rate of 84.08%. When the 
observed hit rate of 64% is compared with this baseline, 
the effect size (k) is .61. As shown in Table 1, this is 
equivalent to a four-alternative hit rate of 54%, or ax 
value of .78,-and is statistically significant (z = 3.04, p = 
0012). os 

The psi effect can be seen even more clearly in ‘the re- 
maining-columns of Table 2, which control for the differ- 
ential popularity of the imagery in the-clips by displaying 
how frequently each was ranked first when it was the tar- 
get compared with how frequently it was ranked first 
when it was one of the control clips (decoys). As:can be 
seen, each of the four clips was selected as the target rel- 
atively more frequently when it was the target than when 
it was a decoy, a difference that is significant for three of 
the four clips. On average, a clip was identified as the tar- 
get 58% of the time when it was the target and only 14% 
of the time when it was a decoy. 
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ory (z = 2.23, p < .05, two-tailed). You now have cause to run 
an additional group of 10 subjecta. What do you think the 
probability is that the results will be significant, by a one- 
tailed test, separately for this group? (p. 105) 


The median estimate was .85, with 9 out of 10 respon- 
dents providing an estimate greater than .60. The correct 
answer is approximately .48. 

As Rosenthal (1990) has warned: “Given the levels of 
statistical power at which we normally operate, we have 
no right to expect the proportion of significant results that 
we typically do expect, even if in nature there is a very 
real and very important effect” (p. 16). In this regard, it.is 
again instructive to consider the medical study that found 
a highly significant effect of aspirin on the incidence of 
heart attacks. The study monitored more than 22,000 
subjects. Had the investigators monitored 3,000 subjects, 
they would have had less than an even chance of finding a 
conventionally significant effect. Such is life with small ef- 
fect sizes. 

Given its larger effect size, the prospects for succeas- 
fully replicating the psi ganzfeld effect are not quite so 
daunting, but they are probably still grimmer than intu- 
ition would suggest. If the true hit rate is in fact about 
34% when 25% ia expected by chance, then an experiment 
with 30 trials (the mean for the 28 studies in the original 
meta-analysis) has only about 1 chance in 6 of finding an 
effect significant at the .05 level with a one-tailed test. A 
60-trial experiment boosts that chance to about 1 in 3. 
One must escalate to 100 trials in order to come close to 
the break even point, at which one has a 60-50 chance of 
finding a statistically significant effect (Utts, 1986). 
(Recall that only 2 of the 11 autoganzfeld studies yielded 
results that were individually significant at the conven- 
tional .05 level.) Those who require that a psi effect be 
statistically significant every time before they will seri- 
ously entertain the possibility that an effect really exists 
know not what they ask. 


Significance Versus Effect Size 


The preceding discussion is unduly pessimistic, how- 
ever, because it perpetuates the tradition of worshipping 
the significance level. Regular readers of this journal are 
likely to be familiar with recent arguments imploring be- 
haviora] scientists to overcome their slavish dependence 
on the significance level as the ultimate measure of virtue 
and instead to focus more of their attention on effect sizes: 
“Surely, God loves the .06 nearly as much as the .05” 
(Rosnow & Rosenthal, 1989, p. 1277). Accordingly, we 
suggest that achieving a respectable effect size with a 
methodologically tight ganzfeld study would be a perfectly 
welcome contribution to the replication effort, no matter 
how untenurable the p level renders the investigator. 

Career consequences aside, this suggestion may seem 
quite counterintuitive. Again, Tversky and Kahneman 
(1971) have provided an elegant demonstration. They 
asked several of their colleagues to consider an investiga- 
tor who runs 15 subjects and obtains a significant t value 
of 2.46. Another investigator attempts to duplicate the 
procedure with the same number of subjects and obtains a 
result in the same direction but with a nonsignificant 
value of t. Tversky and Kahneman then asked their col- 
leagues to indicate the highest level of ¢ in the replication 
study they would describe as a failure to replicate. The 
majority of their colleagues regarded ¢ = 1.70 as a failure 
to replicate. But if the data from two such studies (¢ = 2.46 


and t = 1.70) were pooled, the ¢ for the combined data 
would be about 3.00 (assuming equal variances): 


Thua, we are faced with a paradoxical state of affairs, in 
which the same date that would increase our confidence in 
the finding when viewed as part of the original study, shake 
our confidence when viewed as an independent study. 
(Tversky & Kahnemen, 1971, p. 108) 


Such is the iron grip of the arbitrary .05. Pooling the 
data, of course, is what meta-analysis is all about. Ac- 
cordingly, we suggest that two or more laboratories could 
collaborate in a ganzfeld replication effort by conducting 
independent studies and then pooling them in meta-ana- 
lytic fashion, what one might call real-time meta-analy- 
sis. (Each investigator could then claim the pooled p 
level for his or her own curriculum vitae.) 


Maximizing Effect Size 


Rather than buying or borrowing larger sample sizes, 
those who seek to replicate the psi ganzfeld effect might 
find it more intellectually satisfying to attempt to maxi- 
mize the effect size by attending to the variables associ- 
ated with successful outcomes. Thus researchers who wish 
to enhance the chances of successful replication should 
use dynamic rather than static targets. Similarly we ad- 
vise using participants with the characteristics we have 
reported to be correlated with successful psi performance. 
Random college sophomores enrolled in introductory psy- 
chology do not constitute the optimal subject pool. 

Finally, we urge ganzfeld researchers to read carefully 
the detailed description of the warm social ambiance that 
Honorton et-al. (1990) sought to create in the autoganzfeld 
laboratory. We believe that the social climate created in 
psi experiments is a critical determinant of their success 
or failure. 


The Problem of “Other” Variables 


This caveat about the social climate of the ganzfeld ex- 
periment prompted one reviewer of this article to worry 
that this provided “an escape clause” that weakens the 
falsifiability of the psi hypothesis: “Until Bem and Hon- 
orton can provide operational criteria for creating a 
warm social ambiance, the failure of an experiment with 
otherwise adequate power can always be dismissed as 
due to a lack of warmth.” 

Alas, it is true; we devoutly wish it were otherwise. 
But the operation of unknown variables in moderating 
the success of replications is a fact of life in all of the eci- 
ences. Consider, for example, an earlier article in this 
journal by Spence (1964). He reviewed studies testing 
the straightforward derivation from Hullian learning 
theory that high-anxiety subjects should condition more 
strongly than low-anxiety subjects. This hypothesis was 
confirmed 94% of the time in Spence’s own laboratory at 
the University of Iowa but only 63% of the time in labo- 
ratories at other universities. In fact, Kimble and his as- 
sociates at Duke University and the University of North 
Carolina obtained results in the opposite direction in two 
of three experiments. 

In searching for a post hoc explanation, Spence (1964) 
noted that “a deliberate attempt was made in the Iowa 
studies to provide conditions in the laboratory that might 
elicit some degree of emotionality. Thus, the experi- 
menter was instructed to be impersonal and quite formal 
.-- and did not try to put [subjects] at ease or allay any 
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Crae, 1992), which aseesses six different facets of the ex- 
traversion-introversion factor, 

The sender. In contrast to this information about the re- 
ceiver in psi experiments, virtually nothing is known 
about the characteristics of a good sender or about the ef- 
fects of the sender's relationship with the receiver. As has 
been shown, the initial suggestion from the meta-analysia 
of the original ganzfeld database that pei performance 
might -be enhanced when the sender and receiver are 
friends was not replicated at a statistically significant 
level in the autoganzfeld studies. 

A number of parapsychologists have entertained the 
more radical hypothesis that the sender may not even be a 
necessary element in the psi process. In the terminology of 
parapsychology, the sender-receiver procedure teats for 
the existence of telepathy, anomalous communication be- 
tween two individuals; however if the receiver is somehow 
picking up the information from the target itself, it would 


be termed clairvoyance, and the presence of the sender 


would be irrelevant (except for possible psychological rea- 
sons such as expectation effects). 

At the time of his death, Honorton was planning a ee- 
ries of autoganzfeld studies that would systematically 
compare sender and no-sender conditions while keeping 
both the receiver and the experimenter blind to the condi. 
tion of the ongoing seasion. In preparation, he conducted a 
meta-analytic review of ganzfeld studies that used no 
sender. He found 12 studies with a median of 33.5 Bes- 
sions, conducted by seven investigators. The overall effect 
size () was .56, which corresponds to a four-alternative 
hit rate of 29%. But this efféct size does not reach etatisti- 
cal significance (Stouffer z = 1.31, p = .095). So far, then, 
there is no firm evidence for psi in the ganzfeld in the ab- 
sence of a sender. (There are, however, nonganzfeld etud- 
ies in the literature that do report significant evidence for 
clairvoyance, including a classic card-guessing experiment 
conducted by J. B. Rhine and Pratt (1954].) 


The Physics of Psi 


The psychological level of. theorizing discussed earlier 
does not, of course, addreas the conundrum that makes psi 
phenomena anomalous in the first place: their presumed 
incompatibility with our current conceptual model of 
physical reality. Parapsychologists differ widely from one 
another in their taste for theorizing at this level, but sev- 
eral whose training lies in physics or engineering have 
Proposed physical (or biophysical) theories of psi phenom- 
ena (an extensive review of theoretical parapsychology 
was provided by Stokes, 1987). Only some of these theo- 
ries would force a radical revision in our conception of 
physical reality. 

Those who follow contemporary debates in modern 
physics, however, will be aware that severa] phenomena 
predicted by quantum theory and confirmed by experi- 
ment are themselves incompatible with our current con- 
ceptual model of physical reality. Of these, it is the 1982 
empirical confirmation of Bell’s theorem that has created 
the most excitement and controversy among philosophers 
and the few physicists who are willing to speculate on 
such matters (Cushing & McMullin, 1989; Herbert, 1987). 
In brief, Bell's theorem states that any model of reality 
that is compatible with quantum mechanics must be non- 
local: It must allow for the possibility that the results of 
observations at two arbitrarily distant locations can be 
correlated in ways that are incompatible with any physi- 
cally permissible causal mechanism. 


Several possible models of reality that incorporate non- 
locality have been proposed by both Philosophers and 
physicists. Some of these models clearly rule out psi-like 
information transfer, others permit it, and some actually 
require it. Thus, at a grander level of theorizing, some 
parapsychologists believe that one of the more radical 
models of reality compatible with both quantum mechan- 
ics and psi will eventually come to be accepted. If and 
when that occurs, psi phenomena would cease to be 
enomalous. 

But we have learned that all such talk provokes most of 
our colleagues in psychology and in physics to roll their 
eyes and gnash their teeth. So let’s just leave it at that. 


Skepticism Revisited 


More generally, we have learned that our colleagues’ 
tolerance for any kind of theorizing about pai is strongly 
determined by the degree to which they have been con- 
vinced by the data that psi has been demonstrated. We 
have further learned that their diverse reactions to the 
data themselves are strongly determined by their a priori 
beliefs about and attitudes toward a number of quite gen- 
eral issues, aome scientific, some not. In fact, several 
statisticians believe that the traditional hypothesis test- 
ing methods used in the behavioral sciences should be 
abandoned in favor of Bayesian analyses, which take into 
account a person’s & priori beliefs about the phenomenon 
under investigation (e.g., Bayarri & Berger, 1991; Daw- 
son, 1991). 

In the final analysis, however, we suspect that both 
one’s Bayesian a prioris and one’s reactions to the data 
are ultimately determined by whether one was mote 
severely punished in childhood for Type I or Type II er- 
rors. 
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Replication and Meta-Analysis in 
Parapsychology : 


Jessica Utts ; 


Abstract. Parapsychology, the laboratory study of psychic phenomena, 
has had its ‘history interwoven with that of ‘statistics. Many: of the 
controversies in parapsychology ‘have focused on statistical issues, and 
statistical: models have played an integral role. in the experimental 

- work. Receritly, parapsychologists have been using meta-analysis as a 
tool for synthesizing large bodies of work. This paper presents an 
overview of the ‘use of statistics in-parapsychology and offers a summary 
of the meta-analyses that ‘have been conducted. It begins with some 
anecdotal information about the involvement of statistics and statisti- 
cians with the early history of parapsychology. Next, it is argued that 
most nonstatisticians do not appreciate the connection ‘between power 
and “successful” replication of ‘experimental effects. Returning to para- 
psychology, a particular experimental regime is examined by summariz- 
ing an extended debate over the interpretation of the results. A new set 
‘of experiments designed to resolve the debate is then reviewed. Finally, 
meta-analyses from several areas of parapsychology are summarized. It 
is concluded that the overall evidence indicates that there is an anoma- 
lous effect in need of an explanation. 


Key words and phrases: Effect size, psychic research, statistical contro- 
versies, randomness, vote-counting. 


1. INTRODUCTION 


In a June 1990 Gallup Poll, 49% of the 1236 
respondents claimed to believe in extrasensory per- 
ception (ESP), and one in four claimed to have had 
a personal experience involving telepathy (Gallup 
and Newport, 1991). Other surveys have shown 
even higher percentages; the University of 
Chicago’s National Opinion Research Center re- 
cently surveyed 1473 adults, of which 67% claimed 
that they had experienced ESP (Greeley, 1987). 

Public opinion is a poor arbiter of science, how- 
ever, and experience is a poor substitute for the 
scientific method. For more than a century, small 


numbers of scientists have been conducting labora- 
tory experiments to study phenomena such as 
telepathy, clairvoyance and precognition, collec- 
tively known as “‘psi” abilities. This paper will 
examine some of that work, as well as.some of the 
statistical controversies it has generated. 


ie 
1g 
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Parapsychology, as this field is called, has been a- 
source of controversy throughout its history. Strong 
beliefs tend to be resistant to.change even in the 
face of data, and many people, scientists included, 
seem to have made up their minds on the question 
without examining any empirical data at all. A 
critic of parapsychology recently acknowledged that 
“The level of the debate during the past-130 years 
has been an embarrassment for anyone who would 
like to believe that scholars and scientists adhere 
to standards of rationality and fair play” (Hyman, 
1985a, page 89): While much of the controversy has 
focused on poor experimental design and potential 
fraud, there have been attacks and defenses of the 
statistical methods as well, sometimes calling into 
question the very foundations of probability and 
statistical inference. 

Most of the criticisms have been leveled by psy- 
chologists. For example, a 1988 report of the U.S. 
National Academy of Sciences’ concluded that “The 
committee finds no scientific justification «from 
research conducted over a period of 130: years for 
the existence “of parapsychological ” phenomena” 
WR 8 cE Gente “AND navn 9OV Tha‘chanter - 
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One of the first American researchers to 


use statistical methods in parapsychology was - 


John Edgar Coover, who was the Thomas Welton 
Stanford Psychical Research Fellow in the Psychol- 
ogy Department at Stanford University from 1912 
to 1937. (Dommeyer, 1975). In 1917, Coover pub- 
lished a large volume summarizing his work 
(Coover, 1917). Coover believed that his results 
were consistent with chance, but others have ar- 
gued that Coover’s definition of significance was 
too strict (Dommeyer, 1975). For example, in one 
evaluation of his telepathy experiments, Coover 
found a two-tailed p-value’of 0.0062. He concluded, 
“Since this value, then, lies within the field of 
chance deviation, although the probability of its 
occurrence: by chance is fairly low, it cannot be 
accepted as a decisive indication of some cause 
beyond chance which operated in favor of success in 
guessing” (Coover, 1917, page 82). On the next 


page, he made it explicit that he would require a | 


p-value of 0.0000221 to declare that something 
other than chance was operating. 

It was during the summer of 1930, with the 
card-guessing experiments of J. B. Rhine at Duke 
University, that parapsychology began to take hold 
as a laboratory science. Rhine’s laboratory still 
exists under the name of the Foundation for Re- 
search on the Nature of Man, housed at the edge of 
the Duke University campus. 

It wasn’t long after Rhine published his first 
book, Extrasensory Perception in 1934, that. the 
attacks on his methodology began. Since his claims 
were wholly based on statistical analyses of his 


experiments, the statistical methods were closely — 


scrutinized by critics anxious to find a conventional 
explanation for Rhine's positive results. 

The most persistent critic was a psychologist 
from McGill University named Chester Kellogg 
(Mauskopf and McVaugh, 1979). Kellogg’s main 
argument was that Rhine was using the binomial 
distribution (and normal approximation) on a se- 
ries of trials that were not independent. The experi- 
ments in question consisted of having a subject 


guess the order of a deck of 25 cards, with five each . 


of five symbols, so technically Kellogg was correct. 

By 1937, several mathematicians and statis- 
ticians had come to Rhine’s aid. Mauskopf and 
McVaugh (1979) speculated that since statistics was 
itself a young discipline, “a number of statisticians 
were equally outraged by Kellogg, whose argu- 
ments they saw as discrediting their profession” 
(page 258). The major technical work, which ac- 
knowledged that Kellogg's criticisms were accurate 
but did little to change the significance of the 


and Greenwood, 1937). Stuart, who had been an 
undergraduate in mathematics at Duke, was one of 
Rhine's early subjects and continued to work with 
him-as a researcher until Stuart’s death in 1947. 
Greenwood was a Duke mathematician, who appar- 
ently converted to a statistician at the urging of 
Rhine. ; 

Another prominent figure who was distressed 
with Kelloge’s attack was E. V. Huntington, a 
mathematician at Harvard. After corresponding 
with Rhine, Huntington decided that, rather than 
further confuse the public with a technical reply to 
Kellogg's arguments, a simple statement should be 


-made to the effect that the mathematical issues in 


Rhine’s work had been resolved. Huntington must 
have successfully convinced his former student, 
Burton Camp of Wesleyan, that this was a wise 
approach. Camp was the 1937 President of IMS. 
When the annual meetings were held in December 
of 1937 (jointly with AMS and AAAS), Camp 
released a statement to the press that read: 


Dr. Rhine’s investigations have two aspects: 
experimental and statistical. On the exper- 
imental side mathematicians, of course, 
have nothing to say. On the statistical side, 
however, recent mathematical work has 
established the fact that, assuming that the 
experiments have been properly performed, 
the statistical analysis is essentially valid. If 
the Rhine investigation is to be fairly attacked, 
it must be on other than mathematical grounds 
(Camp, 1937]. , 


One statistician who did emerge as a critic was 
William Feller. In a talk at the Duke Mathemati- 
cal Seminar on April 24, 1940, Feller raised three 
criticisms to Rhine’s work (Feller, 1940). They had 
been raised before by others (and continue to be 
raised even today). The first was that inadequate 
shuffling of the cards resulted in additional infor- 
mation from one series to the next. The second was 
what is now known as the “file-drawer effect,” 
namely, that if one combines the results of pub- 
lished studies only, there is sure to be a bias in 
favor of successful studies. The third was that the 
results were enhanced by the use of optional stop- 
ping, that is, by not specifying the number of trials 
in advance. All three of these criticisms were ad- 
dressed in a rejoinder by Greenwood and Stuart 
(1940), but Feller was never convinced. Even in its 
third edition published in 1968, his book An Intro- 
duction to Probability Theory and Its Applications 
still contains his conclusion about Greenwood and 


Stuart: “Both their arithmetic and their expert: 
miantn lawn a dictinet tinoe of the supernatural 
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o their colleagues at a professional meeting, with 
he question: 


An investigator has reported a result that you 
consider implausible. He ran 15 subjects, and 
reported a significant value, ¢ = 2.46. Another 
investigator has attempted to duplicate his pro- 
cedure, and he obtained a nonsignificant value 
of ¢ with the same number of subjects. The 
direction was the same in both sets of data. 
You are reviewing the literature. What is the 
highest value of ¢ in the second set of data that 
you would describe as a failure to replicate? 
(1982, page 28]. 


n reporting their results, Tversky and Kahne- 
ann stated: 


The majority of our respondents regarded ¢ = 
1.70 as a failure to replicate. If the data of two 
such studies (¢ = 2.46 and ¢ = 1.70) are pooled, 
the value of ¢ for the combined data is about 
3.00 (assuming equal variances). Thus, we are 
faced with a paradoxical state of affairs, in 
which the same data that would increase our 
confidence in the finding when viewed as part 
of the original study, shake our confidence 
when viewed as an independent study (1982, 
page 28]. 


At a recent presentation to the History and Phi- 
osophy of Science Seminar at the University of 
alifornia at Davis, I asked the following. question. 
0 scientists, Professors. A and B, each, have a 
heory they would like to demonstrate. Each plans 
o run a fixed number of Bernoulli trials and then 
est Hy: p = 0.25 versus H,: p > 0.25. Professor A 
as access to large numbers of students each 
bemester to use as subjects. In his first experiment, 
e runs 100 subjects, and there are 33 successes 
p = 0.04, one-tailed). Knowing the importance of 
eplication, Professor A runs an additional 100 sub- 
ects as a second experiment. He finds 36 successes 
p = 0.009, one-tailed). 

Professor B only teaches small classes. Each 
quarter, she runs an experiment on her students to 
est her theory. She carries out ten studies this 

ay, with the results in Table 1. 

I asked the audience by a show of hands to 
indicate whether or not they felt the scientists had 
successfully demonstrated their theories. Professor 
A’s theory received overwhelming support, with 
approximately 20 votes, while Professor B’s theory 
received only one vote. 

If you aggregate the results of the experiments 
for each professor, you will notice that each con- 


with 71 as opposed to 69 successful trials. The 
one-tailed p-values for the combined trials are 
0.0017 for Professor A and 0.0006 for Professor B. 

To address the question of replication more ex- 
plicitly, I also posed the following scenario. In 
December of 1987, it was decided: to prematurely 
terminate a study on the effects of aspirin in reduc- 
ing heart attacks because the data were so convinc- 
ing (see, e.g., Greenhouse and Greenhouse, 1988; 
Rosenthal, 1990a). The physician-subjects had been 
randomly assigned to take aspirin or a placebo. 
There were 104 heart attacks among the 11,037 
subjects in the aspirin group, and 189 heart attacks 
among the 11,034 subjects in the placebo group 
(chi-square = 25.01, p < 0.00001). 

After showing the results of that study, I pre- 
sented the audience with two hypothetical experi- 
ments conducted to try to replicate the original 
result, with outcomes in Table 2. 

I asked the audience to indicate which one they 
thought was a more successful replication. The au- 
dience chose the second one, as would most journal 
editors, because of the “significant p-value.” In 
fact, the first replication has almost exactly the 
same proportion of heart attacks in the two groups 
as the original study and is thus a very close repli- 
cation of that result. The second replication has 


Tasce 1 
Attempted replciations for professor B 


Number of successes One-tailed p-value 
SSS 
0.22 
0.15 
0.23 
0.17 
0.20 
0.18 
0.14 
0.08 
0.31 
0.21 


wid 
NAInrIwooan * 


TABLE 2 
. Hypothetical replications of the aspirin / heart 
attack study 


Replication #1 Replication #2 
Heart attack Heart attack 
Yes No Yes No 


Aspirin 11 1156 20 2314 


Placebo 19 1090 48 2170 
See Pa annar 2. ANWR 
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target, rather than being forced to make a choice 
from a small discrete set of possibilities. Various 
types of target material have been used, including 
pictures, short segments of movies on video tapes, 
actual locations and small objects. 

Despite the more complex target material, the 
statistical methods used to analyze these experi- 
ments are similar to those for forced-choice experi- 
ments. A typical experiment proceeds as follows. 
Before conducting any trials, a large pool of poten- 
tial targets is assembled, usually in packets of four. 
Similarity of targets within a packet is kept to a 
minimum, for reasons made clear below. At the 
start of an experimental session, after the subject is 
sequestered in an isolated room, a target is selected 
at random from the pool. A sender is placed in 
another room with the target. The subject is asked 
to provide a verbal or written description of what 
he or she thinks is in the target, knowing only that 
it is a photograph, an object, etc. 

After the subject’s description has been recorded 
and secured against the potential for later alter- 
ation, a judge (who may or may not be the subject) 
is given a copy of the subject’s description and the 
four possible targets that were in the packet with 
the correct target. A properly conducted experi- 
ment either uses video tapes or has two identical 
sets of target material and uses the duplicate set 
for this part of the process, to ensure that clues 
such as fingerprints don’t give away the answer. 
Based on the subject’s description, and of course on 
a blind basis, the judge is asked to either rank the 
four choices from most to least likely to have been 
the target, or to select the one from the four that 
seems to best match the subject’s description. If 
ranks are used, the statistical analysis proceeds by 
summing the ranks over a series of trials and 
comparing the sum to what would be expected by 
chance. If the selection method is used, a “direct 
hit” occurs if the correct target is chosen, and the 
number of direct hits over a series of trials is 
compared to the number expected in a binomial 
experiment with p = 0.25. 

Note that the subjects’ responses cannot be con- 
sidered to be “random”’ in any sense, so probability 
assessments are based on the random selection of 
the target and decoys. In a correctly designed ex- 
periment, the probability of a direct hit by chance 
is 0.25 on each trial, regardless of the response, and 
the trials are independent. These and other issues 
related to analyzing free-response experiments are 
discussed by Utts (1991). 


4.2 The Psi Ganzfetd Experiments 


isolation technique originally developed by Gestalt 
psychologists for other purposes. Evidence from 
spontaneous case studies and experimental work 
had led parapsychologists to a model proposing that 
psychic functioning may be masked by sensory in- 
put and by inattention to internal states (Honorton, 
1977)..The ganzfeld procedure was specifically de- 
signed to test whether or not reduction of external 
“noise” would enhance psi performance. 

In these experiments, the subject is placed in a 
comfortable reclining chair in an acoustically 
shielded room. To create a mild form of sensory 
deprivation, the subject wears headphones through 
which white noise is played, and stares into a 
constant field of red light. This is achieved by 
taping halved translucent ping-pong balls over the 
eyes and then illuminating the room with red light. 
In the psi ganzfeld experiments, the subject speaks 
into a microphone and attempts to describe the 
target material being observed by the sender in a 
distant room. 

At the 1982 Annual Meeting of the Parapsycho- 
logical Association, a debate took place over the 
degree to which the results of the psi ganzfeld 
experiments constituted evidence of psi abilities. 
Psychologist and critic Ray Hyman and parapsy- 
chologist Charles Honorton each analyzed the re- 
sults of all known psi ganzfeld experiments to date, 
and they reached. strikingly different conclusions 
(Honorton, 1985b; Hyman, 1985b). The debate con- 
tinued with the publication of their arguments in 
separate articles in the March 1985 issue of the 
Journal of Parapsychology. Finally, in the Decem- 
ber 1986 issue of the Journal of Parapsychology, 
Hyman and Honorton (1986) wrote a joint article 
in which they highlighted their agreements and 
disagreements and outlined detailed criteria for 
future experiments. That same issue contained 
commentaries on the debate by 10 other authors. 

The data base analyzed by Hyman and Honorton 
(1986) consisted of results taken from 34 reports 
written by a total of 47 authors. Honorton counted 
42 separate experiments described in the reports, of 
which 28 reported enough information to determine 
the number of direct hits achieved. Twenty three of 
the studies (55%) were classified by Honorton as 
having achieved statistical significance at 0.05. 


4.3 The Vote-Counting Debate 


Vote-counting is the term commonly used for the 
technique of drawing inferences about an experi- 
mental effect by counting the number of significant 
versus nonsignificant studies of the effect. Hedges 
and Olkin (1985) give a detailed analysis of the 
inndannaay af thic methad showing that it is more 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


CPYRGHT 


REPLICATION IN PARAPSYCHOLOGY 8371 


ization, multiple tests used without adjusting the 
significance level (thus inflating the significance 
level from the nominal 5%) and failure to use a 
duplicate set of targets for the judging process (thus 
allowing possible clues such as fingerprints). Using 
cluster and factor analyses, the 12 binary flaw 
variables were combined into three new variables, 
which Hyman named General Security, Statistics 
and Controls. 

Several analyses were then conducted. The one 
reported with the most detail is a factor analysis 
utilizing 17 variables for each of 86 studies. Four 
factors emerged from the: analysis.: From ‘these, 
Hyman concluded that security had increased over 
the years, that the significance level tended to be 
inflated the most for the most complex studies and 
that both effect size and level of significance were 
correlated with the existence of flaws. 

Following his factor analysis, Hyman picked the 
three flaws that seemed to be most highly corre- 
lated with success, which were inadequate atten- 
tion to both randomization and documentation and 
the potential for ordinary communication between 
the sender and receiver. A regression equation was 
then computed using each of the three flaws as 
dummy variables, and the effect size for the experi- 
ment as the dependent variable. From this equa- 
tion, Hyman concluded that a study without these 
three flaws would be predicted to have a hit rate of 
27%. He concluded that this is “well within the 
statistical neighborhood of the 25% chance rate” 


(1985b, page 37), and thus “the ganzfeld psi data - 


base, despite initial impressions, is inadequate ei- 
ther to support the contention of a repeatable study 
or to demonstrate the reality of psi” (page 38). 
Honorton discounted both Hyman’s flaw classifi- 
cation and his analysis. He did not deny that flaws 
existed, but he objected that Hyman’s analysis was 
faulty and impossible to interpret. Honorton asked 
psychometrician David Saunders to write an Ap- 
pendix to his article, evaluating Hyman’s analysis. 
Saunders first criticized Hyman’s use of a factor 
analysis with 17 variables (many of which were 
dichotomous) and only 36 cases and concluded that 
“the entire analysis is meaningless” (Saunders, 
1985, page 87). He then noted that Hyman’s choice 
of the three flaws to include in his regression anal- 
ysis constituted a clear case of multiple analysis, 
since there were 84 possible sets of three that could 
have been selected (out of nine potential flaws), and 
Hyman chose the set most highly correlated with 
effect size. Again, Saunders concluded that “any 
interpretation drawn from [the regression analysis] 
must be regarded as meaningless” (1985, page 88). 


Hyman in his capacity as Chair of the National 
Academy of Sciences’ Subcommittee on Parapsy- 
chology. Using Hyman’s flaw classifications and a 
multivariate analysis, Harris and Rosenthal con- 
cluded that “Our analysis of the effects of flaws on 
study outcome lends no support to the hypothesis 
that ganzfeld research results are a significant 
function of the set of flaw variables” (1988b, 
page:3). 

Hyman and Honorton were in the process of 
preparing papers for a second round of debate when 
they were invited to lunch together at the 1986 
Meeting of the Parapsychological Association. They 
discovered that they were in general agreement on 
several major issues, and they decided to coauthor 
a “Joint Communiqué” (Hyman and Honorton, 
1986). It is clear from their paper that they both 
thought it was more important to set the stage for 
future experimentation than to continue the techni- 
cal arguments over the current data base. In the 
abstract to their paper, they wrote: 


We agree that there is an overall significant 
effect in this data base that cannot reasonably 
be explained by selective reporting or multiple 
analysis. We continue to differ over the degree 

’ to which the effect constitutes evidence for psi, 
but we agree that the final verdict awaits the 
outcome of future experiments conducted by a 
broader range of investigators and according to 
more stringent standards [page 351]. 


The paper then outlined what these standards 
should be. They included controls against any kind 
of sensory leakage, thorough testing and documen- 
tation of randomization methods used, better re- 
porting of judging and feedback protocols, control 
for multiple analyses and advance specification of 
number of trials and type of experiment. Indeed, 
any area of research could benefit from such a 
careful list of procedural recommendations. 


4.5 Rosenthal’s Meta-Analysis 


The same issue of the Journal of Parapsychology 
in which the Joint Communiqué appeared also car- 
ried commentaries on the debate by 10 separate 
authors. In his commentary, psychologist Robert 
Rosenthal, one of the pioneers of meta-analysis in 
psychology, summarized the aspects of Hyman’s 
and Honorton’s work that would typically be in- 
cluded in a meta-analysis (Rosenthal, 1986). It is 
worth reviewing Rosenthal’s results so that they 
can be used as a basis of comparison for the more 
recent psi ganzfeld studies reported in Section 5. 

Rosenthal, like Hyman and Honorton, focused 
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likely to be.selected by the computer’s random 
number generator than any of the others in the. set. 
The selection of the target by the computer is the 
only source of randomness in these experiments. 
This is an important point, and one that is often 
misunderstood. (See Utts, 1991, for elucidation.) 

Eighty of the targets were “dynamic,” consisting 
of scenes from movies, documentaries and cartoons; 
80 were “static,” consisting of photographs, art 
prints and advertisements. The four targets within 
each set were all of the same type. Earlier studies 
indicated that dynamic targets were more likely to 
produce successful results, and one of the goals of 
the new.experiments was to test that theory. 

The randomization procedure used to select the 
target and the order of presentation for judging was 
thoroughly tested before and during the experi- 
ments. A detailed description is given by Honorton 
et al. (1990, pages 118-120). 

Three of the 11 series were pilot series, fives were 
formal series with novice receivers, and three were 
formal series with experienced receivers. The last 
series with experienced receivers was the.only one 
that did not use the 160 targets. Instead, it used 
only one set of four dynamic targets in which one 
target had. previously received several first place 
ranks and one had never received a first place 
rank. The receivers, none of whom had had prior 
exposure to that. target pack, were not aware that 
only one target pack was being used. They each 
contributed one session only to:the series. This will 
be called the “special series” in what follows. 

Except for two of the pilot series, numbers of 
trials were: planned in advance for each series. 
Unfortunately, three of the formal ‘series were not 
yet completed when the funding ran out, including 
the special series, and one pilot study with.advance 
planning was terminated early when the experi- 
menter relocated. There were no unreported trials 
during the 6-year period under review, ‘so there was 
no “file drawer.” 

Overall, there were 183 Rs who contributed only 
one trial and 58 who contributed more than one, for 
a total of 241 participants and 355 trials. Only 23 
Rs had previously participated in ganzfeld experi- 
ments, and 194 Rs (81%) had never participated in 
any parapsychological research. 


5.2 Results 


While acknowledging that no probabilistic con- 
clusions can be drawn from qualitative data, Hon- 
orton et al. (1990) included several examples of 
session excerpts that Rs identified as ‘providing the 
basis for their target rating. To give a flavor for the 


rank, the first example is reproduced here. The 
target was a painting by Salvador Dali called 
“Christ Crucified.” The correct target received a 
first place rank. The part of the mentation R used. 
to. make this assessment read:. 


...1 think of guides, like spirit guides, leading 
me and I.come into a court with a king. It’s 
quiet....It's like heaven. The king is some- 
thing like Jesus. Woman. Now I'm just sort of 
summersaulting through heaven..... . 
Brooding....Aztecs, the Sun. God.. High 
priest... "Fear .... Graves. Woman. 
Prayer ....Funeral.... Dark. 
Death ....Souls....Ten Commandments. 
Moses... .(Honorton et al., 1990]. 


Over all 11 series, there were 122 direct hits in 
the 355 trials, for a hit rate of 34.4% (exact bino- 
mial p-value = 0.00005) when 25% were expected 
by chance. Cohen’s & is 0.20, and a 95% confidence 
interval for the overall hit rate is from 0.30 to 0.39: 
This calculation assumes, of course, that the proba- 
bility of.a direct hit is constant and independent 
across trials, an assumption that may be question- 
able except ‘under the null. hypothesis of.no psi 
abilities. 

Honorton et al. (1990) also calculated effect sizes 
for each of the 11 series and each of the eight - 
experimenters. All but one of the series (the first 
novice series) had positive effect sizes, as did all:of 
the experimenters. 

The special series with experienced Rs had an 
exceptionally high effect size with A = 0.81, corre- 
sponding to 16 direct hits out of 25 trials (64%), but 
the remaining series and the experimenters had 
relatively homogeneous effect sizes given the 
amount of variability expected by chance. If the 
special series is removed, the overall hit rate is 
32.1%, h = 0.16. Thus, the positive effects are not 
due to just one series or one experimenter. . 

Of the 218 trials contributed by novices, 71 were 
direct hits (32.5%, A = 0.17), compared with 51 
hits in the 137 trials by those with prior ganzfeld 
experience (37%, h = 0.26). The hit rates and effect 
sizes were 31% (h = 0.14) for the combined pilot 
series, 32.5% (h = 0.17) for the combined formal 
novice. series, and 41.5% (h = 0.35) for the com- 
bined experienced series. The last figure drops to 
31.6% if the outlier series is removed. Finally, 
without the outlier series the hit rate for the com- 
bined series where all of the planned trials were 
completed was 31.2% (h = 0.14), while it was 35% 
(h= 0. 22) for the combined series that w were termi- 
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scores from zero for the lowest quality, to eight for 
the highest. They included features such as ade- 
quate randomization, preplanned analysis and au- 
tomated recording of the results. The correlation 
between study quality and effect size was 0.081, 
indicating a slight tendency for higher quality 
studies to be more successful, contrary to claims by 
critics that the opposite would be true. There was 
a clear relationship between quality and year of 
publication, presumably because over the years 
experimenters in parapsychology have responded 
to suggestions from critics for improving their 
methodology. ian 

File Drawer. Following Rosenthal (1984), the 
authors calculated the “fail-safe N’” indicating the 
number of unreported studies that would have to be 
sitting in file drawers in order to negate the signifi- 
cant effect. They found N = 14,268, or a ratio of 46 
unreported studies for each one reported. They also 
followed a suggestion by Dawes, Landman and 
Williams (1984) and computed the mean z for all 
studies with z > 1.65. If such studies were a ran- 
dom sample from the upper 5% tail of a N(0,1) 
distribution, the mean z would be 2.06. In this case 
it was 3.61. They concluded that selective reporting 
could not explain these results. 

Comparisons. Four variables were identified 
that appeared to have a systematic relationship to 
study outcome. The first was that the 25 studies 
using subjects selected on the basis of good past 
performance were more ‘successful..than: the .223 
using unselected subjects, with mean effect :sizes:of 
0.051 and 0.008, respectively. Second, the 97 stud- 
ies testing subjects individually were more success- 
ful than the 105 studies that used group testing; 
mean effect sizes were 0.021 and 0.004, respec- 
tively. Timing of feedback was the third moderat- 
ing variable, but information was only available for 
104 studies. The 15 studies that never told the 
subjects what the targets were had a mean effect 
size of —0.001. Feedback after each trial produced 
the best results, the mean ES for the 47 studies 
was 0.035. Feedback after each set of trials re- 
sulted in mean ES of 0.023 (21 studies), while 
delayed feedback (also 21 studies) yielded a mean 
ES of only 0.009. There is a clear ordering; as the 
gap between time of feedback and time of the 
actual guesses decreased, effect sizes increased. 

The fourth variable was the time interval be- 
tween the subject’s guess and the actual target 
selection, available for 144 studies. The best results 
were for the 31 studies that generated targets less 
than a second after the guess (mean ES = 0.045), 
while the worst were for the seven studies that 


trend, decreasing in order as the time interval 
increased from minutes to hours to days to weeks to 
months. “ts 


6.2 Attempts to Influence Random Physical 
Systems 


Radin and Nelson (1989) examined studies de- 
signed to test the hypothesis that “The statistical 
output of an electronic RNG (random number gen- 
erator] is correlated with observer intention in ac- 
cordance with prespecified instructions” (page 
1502). These experiments typically involve RNGs 
based on radioactive decay, electronic noise or pseu- 
dorandom number sequences seeded with true ran- 
dom sources. Usually the subject is instructed to 
try to influence the results of a string of binary 
trials by mental intention alone. A typical protocol 
would ask a subject to press a button (thus starting 
the collection of a fixed-length sequence of bits), 
and then try to influence the random source to 
produce more zeroes or more ones. A run might 
consist of three successive button presses, one each 
in which the desired result was more zeroes or 
more ones, and one as a control with no conscious 
intention. A z score would then be computed for 
each button press. z 

The 832 studies in the analysis were conducted 
from 1959 to 1987 and included 235 “control” stud- 
ies, in which the output of the RNGs were recorded 
but there was no conscious intention involved. 
These were usually conducted before and during 
the:experimental series, as tests of the RNGs. 

Results. The effect size measure used was again 
z//n, where z was positive if more bits of the 
specified type were achieved. The mean effect size 
for control studies was not significantly different 
from zero (—1.0 x 10-5). The mean effect size 
for the experimental studies was also very small, 
3.2 x 1074, but it was significantly higher than the 
mean ES for the control studies (z = 4.1). 

Quality. Sixteen quality measures were defined 
and assigned to each study, under the four general 
categories of procedures, statistics, data and the 
RNG device. A score of 16 reflected the highest 
quality. The authors regressed mean effect size on 
mean quality for each investigator and found a 
slope of 2.5 x 107® with standard error of 3.2 x 
10-5, indicating little relationship between quality 
and outcome. They also calculated a weighted mean 
effect size, using quality scores as weights, and 
found that it was very similar to the unweighted 
mean ES. They concluded that “differences 
in methodological quality are not significant 
predictors of effect size’”’ (page 1507). 
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The correlation between extroversion scores and 
ganzfeld rating scores was r= 0.18, with a 95% 
confidence interval from 0.05 to 0.30. This is con- 
sistent with the mean correlation of r= 0.20 for 
free-response experiments, , determined..from the 
meta-analysis. These correlations indicate that ex- 
troverted subjects can -produce ‘higher ‘scores an 
free-response | ESP tests. 


7: ‘CONCLUSIONS 


Parapsychologists often make a distinction be- 
tween “proof-oriented research” and”’“‘process- 
oriented research.” ‘The, former is typically con- 
ducted to test the hypothesis that psi.abilities exist, 
while the latter is designed to answer“questions 
about how psychic functioning works. Proof- 
oriented research has dominated the literature 
in parapsychology. Unfortunately, many of the 
studies used. small samples and .would thus ‘be 
nonsignificant even if a moderate-cized effect 
exists. 

The recent focus on rere in parapsy- 
chology has revealed that there are small but 
consistently nonzero effects. across studies, experi- 
menters.and ‘laboratories: The sizes of the effects in 
forced-choice studies appear to be comparable to 
those reported in some medical studies that had 
been heralded as breakthroughs. (See Section 5; 
also Honorton and. Ferrari, 1989, page 301.) Free- 
response studies show eect ‘sizes a far- greater 
magnitude. 

A promising direction for future pidesadorentea 
research is to examine the causes of -individual 
differences in psychic functioning. The _ESP /ex- 
troversion meta-analysis is a step in that direction. 

In keeping with the idea of individual differ- 
ences, Bayes and empirical Bayes methods would 
appear to make more sense than the classical infer- 
ence methods commonly used, since they would 
allow individual abilities and beliefs to be modeled. 
Jeffreys (1990) reported a Bayesian analysis of some 
of the RNG experiments and showed that conclu- 
sions were closely tied to prior beliefs even though 
hundreds of thousands of trials were available. 

It may be that the nonzero effects observed in the 
meta-analyses can be explained by something other 
than ESP, such as shortcomings in our understand- 
ing of randomness and independence. Nonetheless, 
there is an anomaly that-needs an explanation. As 
I have argued elsewhere (Utts, 1987), research in 
parapsychology should receive more support. from 
the scientific community. If ESP does not exist, 
there is little to-be lost by erring:in the direction.of 
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much to be gained by discovering how to enhance 
and apply these abilities to important world 
problems. 
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Comment 
M. J. Bayarri and James Berger 


‘1. INTRODUCTION 


There. are many fascinating issues discussed in 
this: paper. Several concern parapsychology ‘itself 
and the interpretation of statistical methodology 
therein: We are not-experts in parapsychology, and 
so have only one comment concerning such mat- 
ters: In Section 3 we briefly discuss the need to 
switch from P-values to Bayes factors in clecnssing 
evidence concerning parapsychology. 


A more general issue raised in the.paper is that ; 


of replication. It is quite illuminating to consider 
the issue of replication from a Bayesian perspec- 
tive, and this is done in Section 2 of our discussion. 


2. REPLICATION 


Many insightful observations concerning replica- 
tion are given in the article, and:these spurred us 
to determine if they could be quantified within 
Bayesian reasoning. Quantification requires clear 
delineation of the possible purposes of replication,: 
and at least two are obvious. The first is simple 
reduction of random error, achieved by obtaining 
more observations from the replication. The second 
purpose is to search for possible bias in the original 
experiment: We use “bias” in a loose sense here, to 
refer to.any of the huge number: of ‘ways 'in which 
the effects being measured by the experiment can 
differ from the actual effects of interest. Thus a 
clinical trial without a placebo can suffer a placebo 
“bias”; a survey can suffer a “bias” due to the 
sampling frame being unrepresentative of the 
actual population; and possible sources of bias 
in parapsychological experiments have been 
extensively discussed. 


Replication to Reduce Random Error 


If the sole goal of replication of an experiment is 
to reduce random error, matters are very straight- 
forward. Reviewing the Bayesian way of studying 
this issue is, however, useful and will be done 
through the following simple example. 
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EXAMPLE 1. Consider the Gtr from Tversky 
and Kahnemann (1982), in which ‘an experiment 
results in ‘a standardized test statistic of z, = 2.46. 
(We will assume normality to keep computations 
trivial:) The question is: What isthe’ highest:value: 
of z, in'a second set of data that would be consid: 
ered a failure to replicate? Two possible precise 
versions of this question are: Question 1: What is 
the probability of observing z, for which the null 
hypothesis would be rejected in the replicated ex- 
periment? Question 2: What value of z, would 
leave one’s overall opinion about the null hypothe- 
sis unchanged? 

Consider the simple case where Z, ~ N(z,|@, 1) 
and (independently) Z, ~ N(z_|9, 1), where @ is 
the mean and 1 is the standard deviation of the 
normal distribution. Note that we are considering 
the case in which no experimental bias is suspected 
and so the means for each experiment are assumed 
to be the same. 

Suppose that it is desired to test Hy: 8 = 0 versus 
H,:6 > 0, and ‘suppose ‘that’ initial prior ‘opinion 
about @ can be described by ‘the noninformative 
prior x(@) = 1. We consider the ‘one-sided ‘testing 
problem with a. ‘constant. prior in‘this section, ‘be- 
cause it is known that then the posterior probabil- 
ity of H,, to‘be denoted by. P(H |data), equals: the « 
P-value, allowing us to avoid complications arising 
from differences between Bayesian and classical’ 
answers. 

After observing z, = 2.46, the posterior distribu- 
tion of 6 is _ 


- «(8 | z,) = N(8{2.46, 1). 
Question 1 then has the answer (using predictive 


Bayesian reasoning) 


P(rejecting at level «| z,) 
oo oo 1 
= [ i Vix coe | 1) dé dz, 


fos c, — 2.46 

V2 ? 
where ® is the standard normal cdf and c, is the 
(one-sided) critical value corresponding to the level, 
a, of the test. For instance, if a= 0.05, then this 


probability equals 0.7178, demonstrating t that ‘there 


is a quite 1 substantial probability ‘that the second 
BERT an mntant TE ~ je chaden'to be 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


near RTS Et SS NSS SS I AS TT a a ES EESTI NSE NS I ARGS 


ca 


REPLICATION IN PARAPSYCHOLOGY ‘ 381 


A sensible candidate for the prior density x(6) 
is the Cauchy (0, V) density 


1 


x ee ————————S oF 

v4) = vis ev 
Flat-tailed densities, such as this, are well known 
to have the property that when discordant data is 
observed (e.g., when (| y — x,| is large), substan- 
tial mass shifts away from the prior center towards 
the likelihood center. It is easy to see that a normal 
prior for 8 can not have the desired behavior. 

Our first surprise in consideration of these priors 
was how small V needed to ‘be chosen in order for 
P(H,|y, x1) to be unaffected by the. bias. For 
instance, even with V = 1.54/100 (recall that 1.54 
was the standard deviation of Y from the original 
experiment), computation yields P(Hy|y, x1) = 
4.3 x 10-5, compared with the P-value (and poste- 


rior probability from the original experiment as- . 


suming no bias) of 2.8 x 1077. There is a clear 
lesson here; even very small suspicions of bias can 
drastically alter a small P-value. Note that replica- 
tion 1 is very consistent with the presence of no 
bias, and so the posterior distribution for the bias 
remains tightly concentrated near zero; for in- 
stance, the mean of the posterior for 6 is then 
7.2 x 107, and the standard deviation is 0.25. 
When we turned attention to replication 2, we 
found that it did not seriously change the prior 
perceptions of bias. Examination quickly revealed 


the reason; even the maximum likelihood estimate : 


of the bias is no more than 1.4 standard deviations 
from zero, which is not enough to change strong 
prior beliefs. We, therefore, considered a third 
experiment, defined in Table 1. Transforming to 
approximate normality, as before, yields 


X3 oe N( x39, 3.48), 


with x, = 22.72 being the actual observation. The 
maximum likelihood estimate of bias is now 3.95 
standard deviations from zero, so there is potential 
for a substantial change in opinion about the bias. 

Sure enough, computation when V = 1.54/100 
yields that E(8|y, x3] = —4.9 with (posterior) 
standard deviation equal to 6.62, which is a dra- 
matic shift from prior opinion (that 6 is Cauchy (0, 


TABLE 1 
Frequency of heart attacks in replication 3 


Yes 


Aspirin § 


1.54/100)). The effect of this is to essentially ignore 
the original experiment in overall assessments of 
evidence. For instance, P(Ho|y¥, x3) = 3.81 x 
10-11, which is very close to P(Hg| x3) = 3.29 x 
10-1". Note that, if 8 were set equal to zero, the 
overall posterior probability of Hy (and P-value) 


- would be 2.62 x 107%. 


Thus Bayesian reasoning can reproduce the intu- 
ition that replication which indicates bias can cast 
considerable doubt on the original experiment, 
while replication which provides no evidence of 
bias leaves evidence from the original experiment 
intact. Such behavior seems only obtainable, how- 
ever, with flat-tailed priors for bias (such as the 
Cauclly) that are very concentrated (in comparison 
with the experimental standard deviation) near 
zero. 


3. P-VALUES OR BAYES FACTORS? 


Parapsychology experiments usually consider 
testing of Hy: No parapsychological effect exists. 
Such null hypotheses are often realistically repre- 
sented as point nulls (see Berger and Delampady, 
1987, for the reason that care must be taken in 
such representation), in which case it is known that 
there is a large difference between P-values and 
posterior probabilities (see Berger and Delampady, 
1987, for review). The article by Jefferys (1990) 
dramatically illustrates this, showing that a very 
small P-value can actually correspond to evidence 
for H, when considered from a Bayesian perspec- 
tive. (This is very related to the famous “Jeffreys” 
paradox.) The argument in favor of the Bayesian 
approach here is very strong, since it can be shown 
that the conflict holds for virtually any sensible 
prior distribution; a Bayesian answer can be wrong 
if the prior information turns out to be inaccurate, 
but a Bayesian answer that holds for all sensible 
priors is unassailable. 

Since P-values simply cannot be viewed as mean- 
ingful in these situations, we found it of interest to 
reconsider the example in Section 5 from a Bayes 
factor perspective. We considered only analysis of 
the overall totals, that is, x = 122 successes out of 
n = 355 trials. Assuming a simple Bernoulli trial 
model with success probability 6, the goal is to test 
H:6 = 1/4 versus H,:0 # 1/4. 

To determine the Bayes factor here, one must 
specify g(@), the conditional prior density on Hj. 
Consider choosing g to be uniform and symmetric, 
that is, 


1 1 
for -—-—-rs@<s—-tT, 
4 4 


6,0) - | 
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bate. This debate is also a good example of how 
statistical criticism can be part of the scientific 
process and lead to better experiments and, in gen- 
eral, better science. 

The remainder of the paper addresses technical 
issues of meta-analysis, drawing upon recent re- 
search in parapsychology for an in-depth -applica- 
tion. Through a series of examples, the author 
presents a convincing argument that power issues 
cannot be overlooked in successive replications and 
that comparison of effect sizes provides a richer 
alternative to the dichotomous measure inherent in 
the use of p-values. This is particularly relevant 
when the potential effect size is small and re- 
sources are: limited, as seems to be the case for psi 
studies. 

The concluding section briefly mentions Bayesian 
techniques. As noted by the author, Bayes (or em- 
pirical Bayes) methodology seems to make sense for 
research in parapsychology. This discussion exam- 
ines possible Bayesian approaches to meta-analysis 
in this field. 


BAYES MODELS FOR PARAPSYCHOLOGY 


The notion of repeatability maps well into the 
Bayesian set-up in which experiments, viewed as a 
random sample from some superpopulation of ex- 
periments, are assumed to be exchangeable. When 
subjects can also be viewed as an approximately 
random sample from some population, it is appro- 


priate to pool them across experiments. Otherwise, : 
analyses that partially pool information according. 


to experimental heterogeneity need to be consid- 
ered. Empirical and hierarchical Bayes methods 
offer a flexible modeling framework for such analy- 
ses, relying on empirical or subjective sources to 
determine the degree of pooling. These richer meth- 
ods can be particularly useful to meta-analysis of 
experiments in parapsychology conducted under 
potentially diverse conditions. 

For the recent ganzfeld series, assuming them 
to be independent binomially distributed as dis- 
cussed in Section 5, the data can be summed 
(pooled) across series to estimate a common hit 
rate. Honorton et al. (1990) assessed the homogene- 
ity of effects across the 11 series using a chi-square 
test that compares individual effect sizes to 
the weighted mean effect. The chi-square statistic 


x2, = 16.25, not statistically significant (p=. 


0.093), largely reflects the contribution of the last 
“special” series (contributes 9.2 units to the x2 
value), and to a lesser extent the novice series with 
a negative effect (contributes 2.5 units). The outlier 
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effects for this data (this result is reported in Sec- 
tion 5). For the remaining 10 series, the chi-square 
value x2 = 7.01 strongly favors homogeneity, al- 
though more than one-third of its value is due to 
the novice series (number 4 in Table 1). This pat- 
tern points to the potential usefulness of a richer 
model to accommodate series that may be distinct 
from the others. For the earlier ganzfeld data ana- 
lyzed by Honorton (1985b), the appeal of a Bayes or 
other model that recognizes the heterogeneity 
across studies is clear cut: x3, = 56.6, p = 0.0001, 
where only those studies with common chance hit 
rate have been included (see Table 2). 

Historic reliance on voting-count approaches to 
determine the presence of psi effects makes it natu- 
ral to consider Bayes models that focus on the 
ensemble of experimental effects from parapsycho- 
logical studies, rather than individual estimates. 
Recent work in parapsychology that compares ef- 
fect sizes across studies, rather than estimating 
separate study effects, reinforces the need to exam- 
ine this type of model. Louis (1984) develops Bayes 
and empirical Bayes methods for problems that 
consider the ensemble of parameter values to be 
the primary goal, for example, multiple compar- 
isons. For the simple compound normal model, 
Y,~ Ny 1), 0: ~ Nl 7’), the standard Bayes 
estimates (posterior means) 

2 
T 
*_ : es oS 
of =p+D(¥,-4#) and D Tae 
where the @; represent experimental effects of in- 
terest, are modified approximately to 


o! = n+ VD(Y;-#) 


when an ensemble loss function is assumed. The 
new estimates adjust the shrinkage factor D so 
that their sample mean and variance match the 
posterior expectation and variance of the @’s. Simi- 
lar results are obtained when the model is gener- 


TaBLe 1 
Recent ganzfeld series 


Series type N Trials Hit rate 
eee nnn EE 
Pilot 0.36 - -0.58 0.44 
Pilot 0.33 
Pilot 0.28 
Novice 0.24 
Novice 0.36 
Novice 0.30 
Novice 0.36 
Novice 0.67 
Experienced 0.43 
Experienced 0.30 
Experienced 0.64 
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maximum likelihood estimation that modify the 
sampling error distribution to yield estimates that 
are “robust” against outlying observations. 

Like its maximum likelihood counterparts, in ad- 
dition to the robust effect estimates 0;*, the Bayes 
model provides (posterior) scale estimates vi‘. These 
can be ‘interpreted as the weight given to the data 
for each 6, in the analysis and are useful to diag- 
nosing which model.components (series or studies) 
are unusual and how they influence the shrinkage. 
When more complex groupings among the 0, are 
suspected, for example, bimodal distribution of 
studies from different sites or experimenters, other 
mixture specifications can ‘be used to further relax 
the shrinkage toward a common. value. 

For the 11 ganzfeld series, the last “outlier” 
series, quite distinct from the others (hit rate = 
0.64), is moderately precise (N = 25). Omitting it 
from the analysis causes the overall hit rate to drop 
from 0.344 to 0.321. The scale mixture model is a 
compromise between these two values (on the logit 
scale), discounting the influence of series 11 on the 
estimated posterior common hit rate used for 
shrinkage. The scale factor j,, an indication of 
how separate @,, is from the other parameters, also 
causes 6%, to be shrunk less toward the common hit 
rate than other, more homogeneous 6,, giving more 
weight to individual information for that series (see 


West, 1985). The heterogeneity of the earlier - 


ganzfeld data is more pronounced, and studies are 
taken from a variety of sources over time. For these 
data, the * can be used to explore atypical studies 
(e.g., study 6, with hit rate = 0.90, contributes more 
than 25% to the x2, value for homogeneity) and 
groupings. among effects, as well as protect the 
analysis from misspecification of second-stage 
normality. 

Variation among ganzfeld series or studies and 
the degree to which pooling or shrinking is appro- 
priate can be investigated further by considering a 
range of priors for r”. If the marginal likelihood of 
7? dominates the prior specification, then results 


should not vary as the prior for r? is varied. Other- 
wise, it is important to identify the degree to which 
subjective information about interexperimental 
variability influences the conclusions. This sen- 
sitivity analysis is a Bayesian enrichment of 
the simpler test of homogeneity directed toward 
determining whether or not complete pooling is 
appropriate. er 
To assess how well heterogeneity among his- 
torical control. groups is determined by the data. 
Dempster, Selwyn and Weeks (1983) propose-three 
priors for r in the logistic-normal model. The prior 
distributions range from strongly favoring indiyid- 
ual estimates, p(r”)dr « r~', to the uniform refer- 
ence prior p(r?)d7 « 7”, flat on the log r scale, to 
strongly favoring complete pooling, p(r?)dr« 7-3 
(the latter forcing. complete pooling for the com- 
pound normal model; see Morris, 1983). For their 


- two examples, the results (estimates of linear treat- 


ment effects) are largely insensitive to variation in 
the prior distribution, but the number of studies in 
each example was large (70 and 19 studies avail- 
able for pooling). For the 11 ganzfeld series, 7? may 
be less well determined by the data. The posterior 
estimate of 72 and its sensitivity to p(r*)dr will 
also depend on whether individual scale parame- 
ters are incorporated into the model. Discounting. 
the influence of the last series will both shift the 
marginal likelihood toward smaller values of: 7? 
and concentrate it more in that region. 
The issue of objective assessment of experimen 
results is one that extends well beyond the field of 


- parapsychology, and this paper provides insight into 


issues surrounding the analysis and interpretation 
of small effects from related studies. Bayes meth- 
ods can contribute to such meta-analyses in two 
ways. They permit experimental and subjective evi- 
dence to be formally combined to determine the 
presence or absence of effects that are not clear cut 
or controversial (e.g., psi abilities). They can also 
help uncover sources and degree of uncertainty in 
the scientific conclusions. , 
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advances methodically and objectively through the 
accumulation of knowledge (or the rejection of false 
knowledge) derived from the implementation of the 
scientific method. But, as we will see, there is more 
to the acceptance of new scientific discoveries than 
the systematic accumulation and evaluation of 
facts. The recognition that there is a social process 
involved with the acceptance or rejection of scien- 
tific knowledge has been the subject of study of 
sociologists for some time. The scientific commu- 
nity’s rejection of the existence of paranormal phe- 
nomena is an excellent case study of this process 
(Allison, 1979; Collins and Pinch, 1979). — 

Implicit in Professor Utts’ presentation and 
paramount to the acceptance of parapsychology as 
a legitimate science are the description and docu- 
mentation of the professionalization of the field of 
parapsychology. It is true that many researchers in 
the field have university appointments; there are 
organized professional societies, for the advance- 
ment of parapsychology; there are journals with 
rigorous standards for published research; the field 
has received funding from federal agencies; and 
parapsychology has received recognition from other 
professional societies, such as the IMS and the 
American Association for the Advancement of Sci- 
ence (Collins and Pinch, 1979). Nevertheless, most 
readers of Statistical Science would agree that 
parapsychology is not accepted as part of orthodox 
science and is considered by most of the scientific 
community to be on the margins of science, at best 
(Allison, 1979; Collins and Pinch, 1979). Why is 
this the case? Professor Utts believes that it is 
because people have not examined the data. She 
states that “Strong beliefs tend to be resistant to 
change even in the face of data, and many people, 
scientists included, seem to have made up their 
minds on the question without examining any em- 
pirical data at all.” 

The history of science is replete with examples of 
resistance by the established scientific community 
to new discoveries. A challenging problem for sci- 
ence is to understand the process by which a new 
theory or discovery becomes accepted by the com- 
munity of scientists and, likewise, to characterize 
the nature of the resistance to new ideas. Barber 
(1961) suggests that there are many different 
sources of resistance to scientific discovery. In 1900, 
for example, Karl Pearson met resistance to his use 


of statistics in applications to biological problems, . 


illustrating a source of resistance due to the use of 
a particular methodology. The Royal Society in- 
formed Pearson that future papers submitted to the 
Society for publication must keep the mathematics 


entific ideas, and the one referred to by Professor 
Utts above, is the prevailing substantive beliefs 
and theories held by scientists at any given time. 
Barber offers the opposition to Copernicus and his 
heliocentric theory and to Mendel’s theory of ge- 
netic inheritance as examples of how, because of 
preconceived ideas, theories and values, scientists 
are not as open-minded to new advances as one 
might think they should be. It was R. A. Fisher 
who said that each generation seems to have found 
in Mendel’s paper only what it expected to find and 
ignored what did not conform to its own expecta- 
tions (Fisher, 1936). 

Pearson’s response to the antimathematical prej- 
udice expressed by the Royal Society was to estab- 
lish with Galton’s support a new journal, 
Biometrika, to encourage the use of mathematics in 
biology. Galton (1901) wrote an article for the first 
issue of the journal, explaining the need for this 
new voice of “mutual encouragement and support” 
for mathematics in biology and saying that “a new 
science cannot depend on a welcome from the fol- 
lowers of the older ones, and {therefore]...it is 
advisable to establish a special Journal for Biome- 
try.” Lavoisier understood the role of preconceived 
beliefs as a source of resistance when he wrote in 
1785, 


I do not expect my ideas to be adopted all at 
once. The human mind gets creased into a way 
of seeing things: Those who have envisaged 
nature according to a certain point of view 
during much of their career, rise only with 
difficulty to new ideas. (Barber, 1961.) 


I suspect that this paper by Professor Utts syn- 
thesizing the accumulation of research results sup- 
porting the existence of paranormal phenomena 
will continue to be received with skepticism by the 
orthodox scientific community “even after examin- 
ing the data.” In part, this resistance is due to the 
popular perception of the association between para- 
psychology and the occult (Allison, 1979) and due 
to the continued suspicion and documentation of 
fraud in parapsychology (Diaconis, 1978). An addi- 
tional and important source of resistance to the 
eviderice presented by Professor Utts, however, is 
the lack of a model to explain the phenomena. 
Psychic phenomena are unexplainable by any cur- 
rent scientific theory and, furthermore, directly 
contradict the laws of physics. Acceptance of psi 
implies the rejection of a large body of accumulated 
evidence explaining the physical and biological 
world as we know it. Thus, even though the effect 


size for a relationship between aspirin and the 
meneedndtan Af Laat attanle ic threa times smaller 
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of a discipline it turns to meta-analysis to answer 
research questions or to resolve controversy (e.g., 
Greenhouse et al., 1990). 

One argument for combining information from 
different studies is that a more powerful result can 
be obtained than from a single study. This objective 
is implicit in the use of meta-analysis in parapsy- 
chology and is the force behind Professor Utts’ 
paper. The issue is that by combining many small 
studies consisting of small effects there is a gain in 
power to find an overall statistically significant 
effect. It is true that the meta-analyses reported by 
Professor Utts find extremely small p-values, but 
the estimate of the overall effect size is still small. 
As noted earlier, because of the small magnitude of 
the overall effect size, the possibility that other 
extraneous variables might account for the rela- 
tionship remains. 

Professor Utts, however, also illustrates the use 
of meta-analysis to investigate how studies differ 
and to characterize the influence of difficult covari- 
ates or moderating variables on the combined esti- 
mate of effect size. For example, she compares the 
mean effect size of studies where subjects were 
selected on the basis of good past performance to 
studies where the subjects were unselected, and she 
compares the mean effect size of studies with feed- 
back to studies without feedback. To me, this latter 
use of meta-analysis highlights the more valuable 
and important contribution of the methodology. 
Specifically, the value of quantitative methods for 


Comment 


Ray Hyman 


Utts concludes that “there is an anomaly that 
needs explanation.”” She bases this conclusion on 
the ganzfeld experiments and four meta-analyses of 
parapsychological studies. She argues that both 
Honorton and Rosenthal have successfully refuted 
my critique of the ganzfeld experiments. The meta- 
analyses apparently show effects that cannot be 
explained away by unreported experiments nor 
over-analysis of the data. Furthermore, effect size 
does not correlate with the rated quality of the 
experiment. 


research synthesis is in assessing the potential ef- 
fects of study characteristics and to quantify the 
sources of heterogeneity in a research domain, that 
is, to study systematically the effects of extraneous 
variables. Tom Chalmers and his group at Harvard 
have used meta-analysis in just this way not only 
to advance the understanding of the effectiveness of 
medical therapies but also to study the characteris: 
tics of good research in medicine, in particular, the 
randomized controlled clinical trial. (See Mosteller 
and Chalmers, 1991, for a review of this work.) 

Professor Utts should be congratulated for her 
courage in contributing her time and statistical 
expertise to a field struggling on the margins of 
science, and for her skill in synthesizing a large 
body of experimental literature. I have found her 
paper to be quite stimulating, raising many inter- 
esting issues about how science progresses or does 
not progress. 
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Neither time nor space is available to respond in 
detail to her argument. Instead, I will point to 
some of my concerns. I will do so by focusing on 
those parts of Utts’ discussion that involve me. 
Understandably, I disagree with her assertions that 
both ‘Honorton and Rosenthal successfully refuted 
my criticisms of the ganzfeld experiments. 

Her treatment of both the ganzfeld debate and 
the National Research Council's report suggests 
that Utts has relied on second-hand reports of the 
data. Some of her statements are simply inaccu- 
rate. Others suggest that she has not carefully read 
what my critics and I have written. This remote- 


ness from the actual experiments and details of the 
te may nartially anseannt for her optimistic 
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us. Harris and Rosenthal were commissioned by 
our evaluation subcommittee to write a paper on 
evaluation issues, especially those related to exper- 
imenter effects. On their own initiative, Harris and 
Rosenthal surveyed a number of data bases to illus- 
trate the application of methodological procedures 
such as meta-analysis. As one illustration, they 
included a meta-analysis of the subsample of 
ganzfeld experiments used by Honorton in his 
rebuttal to my critique. 

Because Harris and Rosenthal did not them- 
selves do a first-hand evaluation of the ganzfeld 
experiments, and because they used Honorton’s rat- 
ings for their illustration, I did not refer to their 
analysis when I wrote my draft for the chapter on 
the paranormal. Rosenthal told me, in a letter, that 
he had arbitrarily used Honorton’s ratings rather 
than mine because they were the most recent avail- 
able. I assumed that Harris and Rosenthal were 
using Honorton’s sample and ratings to illustrate 
meta-analytic procedures. I did not believe they 
were making a substantive contribution to the 
debate. 

Only after the committee’s complete report was 
in the hands of the editors did someone become 
concerned that Harris and Rosenthal had come to a 
conclusion on the ganzfeld experiments different 
from the committee. Apparently one or more com- 
mittee members contacted Rosenthal and asked him 
to explain why he and Harris were dissenting. 

Because some committee members believed that 
we should deal with this apparent discrepancy, I 
contacted Rosenthal and pointed out if he had used 
my ratings with the very same analysis he had 
applied to Honorton’s ratings, he would: have 
reached a conclusion opposite to what Harris and 
he had asserted. I did this, not to suggest my 
ratings were necessarily more trustworthy than 
Honorton’s, but to point out how fragile any conclu- 
sions were based on this smal! and limited sample. 
Indeed, the data were so Jacking in robustness that 
the difference between my rating and Honorton’s 
rating of one investigator (Sargent) on one at- 
tribute (randomization) sufficed to reverse the con- 
clusions Harris and Rosenthal made about the 
correlation between quality and effect size. 

Harris and Rosenthal responded by adding a foot- 
note to their paper. In this footnote, they repor- 
ted an analysis using my ratings rather than 


Honorton's. This analysis, they concluded, still sup- | 


ported the null hypothesis of no correlation be- 
tween quality and effect size. They used 6 of my 12 
dichotomous ratings of flaws as predictors and the z 
score and effect size as criterion variables in both 


lation between criterion variables and flaws of 
“only” 0.46. A true correlation of this magnitude 
would be impressive given the nature and split of 
the dichotomous variables. But, because it was not 
statistically significant, Harris and Rosenthal con- 
cluded that there was no relationship between 
quality and effect size. A canonical correlation on 
this sample of 28 nonindependent cases, of course, 
has virtually no chance of being significant, even if 
it were of much greater magnitude. 

What this amounts to is that the alleged contra- 
dictory conclusions of Harris and Rosenthal are 
based on a meta-analysis that supports Honorton’s 
position when Honorton’s ratings are used and 
supports my position when my ratings are used. 
Nothing substantive comes from this, and it is 
redundant with what Honorton and I have already 
published. Harris and Rosenthal’s footnote adds 
nothing because it supports the null hypothesis 
with a statistical test that has no power against a 
reasonably sized alternative. It is ironic that Utts, 
after emphasizing the importance of considering 
statistical power, places so much reliance on the 
outcome of a powerless test. 

(I should add that the recurrent charge that the 
NRC committee completely ignored Harris and 
Rosenthal's ‘conclusions is not strictly correct. ‘I 
wrote a response to the Harris and Rosenthal paper 
that was included in the same supplementary 
volume that contains their commissioned paper.) 

Utts’ discussion of the ganzfeld debate, as I have 
indicated, also shows unfamiliarity with details. 
She cites my factor analysis and Saunders’ critique 
as if these somehow jeopardized the conclusions I 
drew. Again, the matter is too complex to discuss © 
adequately in this forum. The “factor analysis’’ she 
is talking about is discussed in a few pages of my 
critique. I introduced it as a convenient way to 
summarize my conclusions, none of which depended 
on this analysis. I agree with what Saunders has to 
say about the limitations of factor analysis in this 
context. Unfortunately, Saunders bases his criti- 
cism on wrong assumptions about what I did and 
why I did it. His dismissal of the results as 
“meaningless” is based on mistaken algebra. I in- 
cluded as dummy variables five experimenters in 
the factor analysis. Because an experimenter can 
only appear on one variable, this necessarily forces 
the average intercorrelation among the experi- 
menter variables to be negative. Saunders falsely 
asserts that this negative correlation must be —1. 
If he were correct, this would make the results 
meaningless. But he could be correct only if there 
were just two investigators and that each one ac- 
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Comment 


Robert Ll. Morris 


Experimental sciences by their nature have found: 
it relatively easy to deal with simple closed sys- 
tems. When'they:come to study more complex, open 
systems, however, they have more difficulty in gen- 


erating testable models, must rely more on multi-. 


variate approaches, have more diversity from 
experiment to experiment (and thus more difficulty 
in constructing replication attempts), have more: 
noise in the data, and more difficulty in construct- 
ing a linkage between concept and measurement. 
Data gatherers and other. researchers are more 
likely to be part of the system themselves. Exam- 
ples include ecology, economics, social psychology 
and parapsychology. Parapsychology can be re- 
garded as the study of apparent new means’ of 
communication, or transfer of influence, between 
organism and environment. Any observer attempt- 
ing to decide whether or not such psychic communi- 
cation has taken place is one of several elements in 
a complex open system.composed of an indefinite 
number of interactive features. The system can be 
modeled, as has been done elsewhere (e.g., Morris, 
1986) such as to organise our understanding of how 
observers can be misled by.themselves, or by delib- 
erate frauds. Parapsychologists designing experi- 
mental studies must take extreme care to ensure 
that the elements in the experimental system do 
not interact in unanticipated ways to produce arti: 
fact or encourage fraudulent procedures. When re- 
searchers follow up the findings. of ‘others, they 
must ensure that the new experimental system 
sufficiently resembles the.earlier one, regarding its 
important components and their potential interac- 
tions. Specifying sufficient resemblance is more dif- 
ficult. in complex and open systems, and in areas of 
research using novel methodologies. 

As a result, parapsychology and other such areas 
may well profit from the application of modern 
meta-analysis, and meta-analytic methods may in 
turn profit from being given a good stiff workout by 
controversial data bases, as suggested by Jessica 
Utts in her article. Parapsychology would appear to 
gain from meta-analytic techniques, in at least 
three important areas. — 

First, in assessing the question of replication 
rate, the new focus on effect size and confidence 


Robert L. Morris occupies: the Koestler Chair of 


intervals rather than arbitrarily chosen. signi 


- cance levels seems to indicate much greater consis- 


tency in the findings than has previously been. 
claimed. 

Second, when one codes the individual studies for 
flaws and relates flaw abundance with effect size, 
there appears to‘be little correlation for all ‘but one 
data base. ‘This contradicts the frequent assertion 
that parapsychological results disappear when 
methodology is tightened. Additional evidence on 
this point is the series of studies by Honorton and 
associates using an automated ganzfeld procedure, 


apparently better conducted than any of the previ- 


ous research, which nevertheless obtained an effect 
size very similar to that of the earlier more diverse 
data base. © bo" Rares 

Third, meta-analysis allows researchers to: look 
at moderator variables, to build a clearer picture. of 
the conditions that appear to produce the strongest 
effects. Research in any: real scientific ‘discipline 
must be cumulative, with later researchers build- . 
ing on the work of those who preceded them. If our 
earlier successes and failures have meaning, they 
should help us obtain “increasingly «consistent, 
clearer results: If psychic:ability exists and is suffi- 
ciently stable that it can-be manifest in controlled 
experimental studies; then’ moderator : variables 
should be present in groups of studies that would 
indicate conditions most favourable and least 
favourable to the production of large effect sizes. 
From the ‘analyses presented by Utts, for instance, 
it seems evident that group studies tend to produce 
poor results and, however convenient it may be to 
conduct them, future researchers should apparently 
focus much more on individual testing. When doing 
ganzfeld studies, it appears best to work with dy- 
namic rather than static target material. and with 
experienced participants rather than novices. If 
such results are valid, then future researchers who 
wish to get strong results now have a better idea of 
what procedures to select to increase the likelihood 
of :so doing, what elements in the experimental 
system seem most relevant. The proportion of stud- 
ies obtaining positive results should therefore 
increase. , 

However, the situation may be more complex 
than the somewhat ideal version painted above. As 
noted earlier, meta-analysis may learn from para- 
psychology as well as vice versa. Parapsychological 
data may well give meta-analytic techniques @ good 

sd wntY anetataler ance came’ challenges. 
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misses estimated; perhaps Cohen’s & greatly un- 
derestimates effect size when very low. probability 
events (less than 1 in 50 for heart attack in the 
placebo condition and less than 1 in.a 100 for 
aspirin) ‘are-involved. I’m not a statistician and 
thus don’t know if there is.a relevant siterabare on 
this. pout” 


Comment 
Frederick Mosteller 


Dr. Utts’s discussion stimulates me to offer some 
comments that bear on her topic but do not, in the 
main, fall into an agree-disagree mode. My refer- 
ences refer to her bibliography. 

Let me recommend J. Edgar Coover’s work to 
statisticians who would like to read about a pretty 
sequence of experiments developed and executed 
well before Fisher’s book on experimental design 
appeared. Most of the standard kinds of ESP exper- 
iments (though not the ganzfeld) are carried out 
and reported i in this 1917 book. Coover even began 
looking into the amount of information contained 
in cues such as whispers. He also worked at expos- 
ing mediums. I found the book most impressive. As 
Utts says in her article, the question of significance 
level.was a-puzzling one, and one we still cannot 


solve even though some fields. seem to. have stan: ws 


dardized on.0.05. 


When Feller’s ionusnents on ‘Stuart ‘and Green- . 


wood’s ‘sampling experiments came out-in the first 
edition of his book, I was surprised. Feller devotes 
a problem to the results of generating 25 symbols 
from the set a, b, c, d and e (page 45, first edition) 
using random numbers with 0 and 1 corresponding 
to a, 2 and 3 to b, etc. He asks the student to find 
out how often the 25 produce 5 of each symbol. He 
asks the student to check the results using random 
number tables. The answer seems to be about 1 
chance in 500. In-a footnote Feller then says “They 
{random numbers] are occasionally extraordinarily 
obliging: c.f. J. A. Greenwood and E. E. Stuart, 
Review of Dr. Feller’s Critique, - Journal of Para- 


Frederick Mosteller is Roger I. Lee Professor of 
Mathematical Statistics, Emeritus, at Harvard Uni- 
versity and Director of the Technology Assessment 
Group in the Harvard School of Public’ Health. His 
mailing address is Denartment of Statistics. Har- 


The above objections should not detract from the 
overall value of the Utts survey. The findings she 
reports will need to be replicated; but even as is, 
they..provide .a challenge :to-some of the cherished 
arguments of counteradvocates, yet also challenge 
serious researchers to use these’ sais uccarely 

sl risa for fre studies. 


psychology, vol. 4 (1940), pp. 298-319, in particular 
p. 806.” The 25 symbols of 5 kinds, 5 of each, 
correspond to the cards in a parapsychology deck. 
The point of page 306 is that Greenwood. and: 
Stuart on that page claim to have generated two 
random orders of such a deck using Tippett’s table 
of random numbers. Apparently Feller thought that 
it would have taken them a long time to do it. If 
one assumes that Feller’s way of generating a ran- 


dom shuffle is required, then. it would indeed. be 


unreasonable to suppose that the.experiments could 
be carried out quickly. I.wondered then whether 
Feller ; thought this-was.the only-way to :produce a’ 
random. order:to such a deck.of cards. If you happen 
to know -how to shuffle .a deck. efficiently. using — 
random numbers, it is-hard-to-believe that: others. 
donot know. I decided -to. test-.it out and so I-- 
proposed to a.class-of 90 people.in mathematical 
statistics that we find a way-of using random num- 
bers to shuffle a deck of cards. Although they were . 
familiar with random numbers, they could not come 
up with a way of.doing it, nor-did anyone after class. 
come in with a workable idea though several stu- 
dents made proposals. I concluded ‘that inventing - 
such a shuffling technique was a hard problem and 
that maybe Feller just did not know how at the 
time of writing the footnote. My face-to-face at- 
tempts to. verify this failed.‘because. his response 
was .evasive. I also recall Feller speaking: at: a 
scientific, meeting..where someone had :complained 
about mistakes in published papers. He said essen- 
tially that:we:won't have any literature if-mistakes 
are disallowed and further claimed that he always 
had mistakes in ‘his own papers, hard as he tried to 
avoid them. It was fun to hear him speak. 

Although I find Utts’s discussion of replication 
engaging as a problem in human perception, T do’ 
always feel that people should not be: expected’ to. 


carry out difficult mathematical exercise n their © 
head aff the onff. withant samnntars | Rethboks or 
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ited PRL for several days and was a subject in 
Series 101” (pages 184-135]. 


Honorton has also informed me (personal communi- 


cation, July 25, 1991) that several self-proclaimed 
skeptics. have visited his laboratory and received 
demonstrations of the autoganzfeld procedure and 
that no one expressed any concern with the secu- 
rity arrangements. 

This may not completely satisfy Professor Diaco- 
nis’ objections, but it does indicate a serious effort 
on the part of the researchers to involve such peo- 
ple. Further, the original publication of the re- 
search in Section 5 followed the reporting criteria 
established by Hyman and Honorton (1986), thus 
providing much more detail for the reader than the 
earlier published records to which Professor 
Diaconis alludes. 


Points Raised by Greenhouse 


Greenhouse enumerated four items that offer al- 
ternative explanations for the observed anomalous 
effects. Three of these (items 2-4) will be addressed 
in this section by elaborating on the details pro- 
vided in-my paper. His item 1 will be addressed in 
a later section. 

Item 2 on his list questioned the role of experi- 
menter expectancy effects as a potential confounder 
in parapsychological research. While the expecta- 
tions of the experimenter may influence the report- 
ing of results, the ganzfeld experiments (as well as 


other psi experiments) are conducted in-such-a. way - 


that experimenter expectancy cannot account for 


the results themselves. Rosenthal, who Greenhouse- 
cites as the expert in this area, addressed this in 


his background paper for the National Research 
Council (Harris and Rosenthal, 1988a) and con- 
cluded that the ganzfeld studies were adequately 
controlled in this regard. He also visited the auto- 
ganzfeld laboratory and was given a demonstration 
of that procedure. 

Greenhouse’s item 3, the question of what consti- 
tutes a direct hit, was addressed in my paper but 
perhaps needs elaboration. Although’ free-response 
experiments do generate substantial amounts of 
subjective data, the statistical analysis requires 
that the results for each trial be condensed into a 
single measure of whether or not a direct hit was 
achieved. This is done by presenting four choices to 
a judge (who of course does not know the correct 
answer) and asking the judge to decide which of the 
four best matches the subject’s response. If the 
judge picks the target, a direct hit has occurred. 

It is true that different judges may differ on their 


Pt ae ae nent Aner hae haan a divact 


cal question is the same. Under the null hypothe- 
sis, since the target is randomly selected from the 
four possibilities presented, the probability of a 
direct hit is 0.25 regardless of who does the judg- 
ing. Thus, the observed anomalous effects cannot 
be explained by assuming there was an over- 
optimistic judge. 

If Professor Greenhouse is suggesting that the 
source of judging may be a moderating variable 
that determines the magnitude of the demonstrated 
anomalous effect, I agree. The parapsychologists 


- have considered this issue in the context of whether 


or not subjects should serve as judges for their own 
sessions, with differing opinions in different labora- 
tories. This is an example of an area that has been 
suggested for further research. 

Finally, Greenhouse raised the question of the 
accuracy of the file-drawer estimates used in the 
reported meta-analyses. I agree that it is instruc- 
tive to examine the file-drawer estimate using more 
than one model. As an example, consider the 39 
studies from the direct hit and autoganzfeld data 
bases. Rosenthal’s fail-safe N estimates that there 
would have to be 371 studies in the file-drawer to 
account for the results. In contrast, the method 
proposed by Iyengar and Greenhouse gives a file-. 
drawer estimate of 258 studies. Even this estimate 
is unrealistically large for a discipline with as few 
researchers as parapsychology. Given that the av- 
erage number of trials per experiment is 30, this 
would represent almost 8000 unreported trials, and 
at least that many hours of work. 

There are pros and cons to any method of esti- 
mating the number of unreported studies, and the 
actual practices of the discipline in question should 
be taken into account. Recognizing publication bias 
as an issue, the Parapsychological Association has 
had an official policy since 1975 against the selec- 
tive reporting of positive results. Of the original 
ganzfeld studies reported in Section 4 of my paper, 
less than half were significant, and it is a matter of 
record that there are many nonsignificant studies 
and “failed replications” published in all areas of 
psi research. Further, the autoganzfeld database 
reported in Section 5 has no file-drawer. Given the 
publication practices and the size of the field, the 
proposed file-drawer cannot account for the ob- 
served effects. 


Points Raised by Hyman 


One of my goals in writing this paper was to 
present a fair account of recent work and debate in 
parapsychology. Thus, I was disturbed that Hy- 
man, who has devoted much of his career to the 
studv of parapsychology, and who had first-hand 
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and with the outcome canonical variable but three 
correlated negatively” (page 2, italics added). 
Rosenthal (personal communication, July 23, 1991) 
verified that this was indeed the point he was 
trying to make. Readers who are interested in 
drawing their own conclusions from first-hand 
analyses can find Hyman’s original flaw codings in 
an Appendix to his paper (Hyman, 1985, pages 
44-49). 

Finally, in my paper, I stated that the parapsy- 
chology chapter of the National Research Council 
report critically evaluated statistically significant 
experiments, but not those that were nonsignifi- 
cant. Professor Hyman “does not know how [I] got 
such an impression,” so I will clarify by outlining 
some of the material reviewed in that report. There 
were surveys of three major areas of psi research: 
remote viewing (a particular type of free-response 
experiment), experiments with random number 
generators, and the ganzfeld experiments. As an 
example of where I got the impression that they 
evaluated only significant studies, consider the sec- 
tion on remote viewing. It began by referencing a 
published. list of 28 studies. Fifteen of these were 
immediately discounted, since “only 13... were 
published under refereed auspices” (Druckman and 
Swets, 1988, page 179). Four more were then dis- 
missed, since “Of the 13 scientifically reported 
experiments, 9 are classified as successful” (page 
179). The report continued by discussing these nine 
experiments, never again mentioning any of the 
remaining 19 studies. The other sections of the 
report placed similar emphasis on significant stud- 
ies: 1 did not think this was a valid statistical 
method for surveying a large body of research. 


Minor Point Raised by Morris 


The final clarification I would like to offer con- 
cerns the minor point raised by Professor Morris, 
that “When Honorton omitted studies that did not 
report direct hits as a measure, he may have biased 
his sample.” This possibility was explicitly ad- 
dressed by Honorton (1985, page 59). He examined 
what would happen if z-scores of zero were inserted 
for the 10 studies for which the number of direct 
hits was not measured, but could have been. He 
found that even with this conservative scenario, 
the combined z-score only dropped from 6.60 to 
5.67. 


SATISFYING THE SKEPTICS 
Parapsychology is probably the only scientific 


discipline for which there is an organization of 
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Paranormal (CSICOP) was established in 1976 by 
philosopher Paul Kurtz and sociologist Marcello 
Tyuzzi when “Kurtz became convinced that the 
time was ripe for a more active crusade against 
parapsychology and other pseudo-scientists” (Pinch 
and Collins, 1984, page 627). Truzzi resigned from 
the organization the next year (as did Professor 
Diaconis) “because of what he saw as the growing 
danger of the committee’s excessive negative zeal 
at the expense of responsible scholarship” (Collins 
and Pinch, 1982, page 84). In an advertising 
brochure for their publication The Skeptical In- 
quirer, CSICOP made clear its belief that paranor- 
mal phenomena are worthy of scientific attention 
only to the extent that scientists can fight the 
growing interest in them. Part of the text of the 
brochure read: “Why the sudden explosion of inter- 
est, even among some otherwise sensible people, in 
all sorts of paranormal ‘happenings’?... Ten years 
ago, scientists started to fight back. They set up an 
organization—The Committee for the Scientific In- 
vestigation of Claims of the Paranormal.” 

During the six years that I have been working 
with parapsychologists, they have repeatedly ex- 
pressed their frustration with the unwillingness of 
the skeptics to specify what: would constitute ac- 
ceptable evidence, or even to delineate criteria for 
an acceptable experiment. The Hyman and Honor- 
ton Joint Communiqué was seen as the first major 
step in that direction, especially since Hyman was 
the Chair of the Parapsychology Subcommittee of 
CSICOP. 

Hyman and Honorton (1986) devoted eight pages 
to “Recommendations for Future Psi Experiments,” 
carefully outlining details for how the experiments 
should be conducted and reported. Honorton and 
his colleagues then conducted several hundred 
trials using these specific criteria and found essen- 
tially the same effect sizes as in earlier work for 
both the overall effect and effects with moderator 
variables taken into account. I would expect Profes- 
sor Hyman to be very interested in the ‘results of 
these experiments he helped to create. While he did 
acknowledge that they “have produced intriguing 
results,” it is both surprising and disappointing 
that he spent only a scant two paragraphs at the 
end of his discussion on these results. 

Instead, Hyman seems to be proposing yet an- 
other set of requirements to be satisfied before 
parapsychology should be taken seriously. It is dif- 
ficult to sort out what those requirements should be 
from his account: “(They should] specify, in ad- 
vance, the complete sample space and the critical 
region. When they get to the point where they can 
enanife thic alano with came houndary conditions 
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cal studies, resulting from the observation by one 
physician that his lung cancer patients who smoked 
did not recover at the same rate as those who did 
not. There are many medications in common use 
for which there is still no medical explanation for 
their observed therapeutic effectiveness, but that 
does not prohibit their use. 

There are also examples where a coherent theory 
of a phenomenon was impossible because the re- 
quisite background information was missing. For 
instance, the current theory of endorphins as an 
explanation for the success of acupuncture would 
have been impossible before the discovery of endor- 
phins in the 1970s. 

Mosteller’s observation that ESP will not replace 
the telephone leads to the question of whether or 
not psi abilities are of any use even if they do exist, 
since the effects are relatively small. Again, a look 
at history is instructive. For example, in 1938 For- 
tune Magazine reported that “At present, few sci- 
entists foresee any serious or practical use for 
atomic energy.” 

Greenhouse implied that I think parapsychology 
is not accepted by more of the scientific community 
only because they have not examined the data, but 
this misses the main point I was trying to make. 


The point is that individual scientists are willing to 


express an opinion without any reference to data. 
The interesting sociological question is why they 
are so resistant to examining the data. One of the 
major reasons is undoubtedly the perception identi- 
fied by Greenhouse that there is some connection 
between parapsychology and the occult, or worse, 
religious beliefs. Since religion is clearly not in the 
realm of science, the very thought that parapsy- 
chology might be a science leads to what psychol- 
ogists call “cognitive dissonance.” As noted by 
Griffin (1988), “People feel unpleasantly aroused 
when two cognitions are dissonant—when they con- 
tradict one another” (page 33). Griffin continued by 
observing that there are also external reasons for 
scientists to discount the evidence, since “It is gen- 
erally easier to be a skeptic in the face of novel 
evidence; skeptics may be overly conservative, but 
they are rarely held up to ridicule” (page 34). 

In summary, while it may be safer and more 
consonant with their beliefs for individual scien- 
tists to ignore the observed anomalous effects, the 
scientific community should be concerned with 
finding an explanation. The explanations proposed 
by Greenhouse and others are simply not tenable. 


REPLICATION AND MODELING 


Daroneuchalnov is ane of the few areas where a 


specify what should happen if there is no such 


thing as ESP by using simple binomial models, 
either to find p-values or Bayes factors. As noted 
by Mosteller, if there is no ESP, or other nonstatis- 
tical explanation for an effect, we should be able to 
carry out null experiments and get no effect. Other- 
wise, we should be worried about using these sim- 
ple models for other applications. 

Greenhouse, in his first alternative explanation 
for the results, questioned the use of these simple 
models, but his criticisms do not seem relevant to 
the experiments discussed in Section 5 of my paper. 
The experiments to which he referred were either 
poorly controlled, in which case no statistical anal- 
ysis could be valid, or were specifically designed to 
incorporate trial by trial feedback in such a way 
that the analysis needed to account for the added 
information. Models and analyses for such experi- 
ments can be found in the references given at the 
end of Diaconis’ discussion. 

For the remainder of this discussion, I will con- 
fine myself to models appropriate for experiments 
such as the autoganzfeld described in Section 5. It 
is this scenario for which Bayarri and Berger com- 
puted Bayes factors, and for which Dawson dis- 
cussed possible Bayesian models. 

If ESP does exist, it is undoubtedly a gross over- 
simplification to use a simple non-null binomial 
model for these experiments. In addition to poten- 
tial differences in ability among subjects, there 
were also observed differences due to dynamic ver- 
sus static targets, whether or not the sender was a 
friend, and how the receiver scored on measures of 
extraversion. All of these differences were antici- 
pated in advance and could be incorporated into 
models as covariates. 

It is nonetheless instructive to examine the Bayes 
factor computed by Bayarri and Berger for the 
simple non-null binomial model. First, the observed 
anomalous effects would be less interesting if the 
Bayes factor was small for reasonable values of r, 
as it was for the random number generator experi- 
ments analyzed by Jefferys (1990), most of which 
purported to measure psychokinesis instead of ESP. 
Second, the Bayes factor provides a rough measure 
of the strength of the evidence against the null 
hypothesis and is a much more sensible summary 
than the p-value. The Bayes factors provided by 
Bayarri and Berger are probably more conserva- 
tive, in the sense of favoring the null hypothesis, 
than those that would result from priors elicited 
from parapsychologists, but are probably reason- 
able for those who know nothing about past ob- 
served effects. I expect tht most parapsychologists 
would not opt for a prior symmetric around chance, 
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THE ENHANCED HUMAN PERFORMANCE PROJECT: Jog 
AN ASSESSMENT OF THE EFFORT TO DATE 


SG1J 


PROJECT REVIEW GROUP 
14 APRIL, 1987 


At the request of MG Philip K. Russell, MC, Commander, United States Army Medical 
Research and Development Command, the following individuals met at the Pentagon on 6 


March 1987 to assess the work of the Enhanced Human Performance Project: 


Ms. Amoretta Hoeber, TRW 
Dr. Jack Vorona, DIA 
Dr. Michael A. Wartell, Humboldt State University 


Dr. Nick“Yaru, Consultant (Chairman) 
Dr. Chris Zarafonetis, Biomedical R&D, Inc. 


Others in attendance at this meeting included: 


BG Richard T. Travis, MC, Deputy Commander, USAMRDC 
Col. Philip Sobocinski, MSC, Special Assistant for Biotechnology 
Col. Peter J. McNelis, MSC, Project Manager/COR 

Mrs. Jean Smith, Principal Assistant Responsible for Contracting 
Dr. Edwin C. May, SRI, Principal Investigator 


In preparation for this meeting, copies of all Project reports for Fiscal Year 1986 along 
with the Scientific Oversight Committee’s comments regarding these reports and the contrac- 


tor’s responses to the comments were forwarded to each of the above-mentioned individuals 


for their review. 


The Project Review Group was asked, via correspondence (MG Russell, 12 January 
1987; Col. McNelis, 12 February 1987) and by BG Travis in his welcoming remarks at the 


meeting, to address the following questions concerning the Project: 
1. Is the science underlying this research effort essentially sound? 
2. Does the evidence to date support the existence of an anomaly? 


3. What is the potential value of this effort to the DOD? 
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4. Is the research focus and level of effort appropriate? *" 


The agenda for the meeting is attached as Enclosure 1. Following a presentation of the 
Project's historical antecedents, the questions listed above provided the structure for a discus- 
sion of: FY 1986 research tasks and results, the overall plan underlying the FY 1986, effort 


and possible modifications of the plan for follow-on work. 


The Review Group’s responses to the preceding questions and their recommendations for 
the Project will be presented in turn. It should be noted that there was unanimity among the 


members of the Review Group with regard to these responses. 
1. Is the science sound? 


The individual experiments conducted during Fiscal Year 1986 appear to be 
scientifically sound. The primary contractor’s response to comments of the 
Scientific Oversight Committee (SOC) leads this Review Group to conclude 
that the scientific quality of the effort is under continual qualified scrutiny, 
and immediate adjustments are made by the researchers to insure that that 
quality. continues. Additionally, appropriate community-wide symposia such 
as the Theory and Proof of Principle conferences projected for FY 1987 will 
enhance that quality. 


2. ‘Is there an anomaly? 


The results of experiments conducted by this Project during FY 1986, as well 
as other reports of previous operational related research, lead this Review 
Group to conclude that a natural anomaly exists, which we will refer to as 


Remote Viewing. 
3. Is it worthwhile? 


The Review Group believes that progress is being made in understanding this 
anomaly and that continuation of the effort is not only warranted, but entirely 


appropriate and strongly recommended. 


Should Remote Viewing be predictably reproducible and its mechanisms, 
parameters and physiological correlates understood, there would be a number 
of significant applications for the DoD. Current user agencies have reported 


utilizing the present technology with positive results. 
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Is the direction and emphasis appropriate? 


The Review Group believes that the probability of success in demonstrating 
and explaining a phenomenon known as Remote Action is less than the 
probability of success for the Remote Viewing phenomenon. Rather than 
continuing to explore both phenomena at equal levels of effort, it is 
recommended that the results of this year’s (FY87) effort be critically 
reviewed and those areas that demonstrate the most promise be exploited and 
those that do not be terminated. The focus then would be less diffuse and 


more vertical as the more productive pathways are emphasized. 


This should not be considered an economy measure, however, since the 
vertical effort should be assured of adequate resources to accomplish its more 


definitive tasks. 


The Review Group also recommends that the Project should clarify its use of 
the terms: global/conceptual replication (i.e., other labs evidence the 
phenomena without following the same protocol), exact/technical replication | 
(i.e., phenomena evidenced in other labs following the same protocol with 
other subjects and other targets), and reproducibility (i.e., phenomena 
evidenced by the same subjects over time utilizing the same randomly ordered 
target set). With this in mind, it is recommended that an effort be made to 
enhance the reproducibility of the phenomena by identifying and utilizing 
especially talented individuals. It is believed that this pool of talented 
subjects would also aid in isolating neurophysiological correlates and 


mechanisms. 


It is also recommended that one or two other secure labs be identified to 
carry out exact/technical replication of the most promising experiments 


conducted by the primary contractor. 


Overall, the current breadth of experiments selected to demonstrate and 
explicate the phenomena is appropriate, as is,the present level of effort 


assigned to each of these experiments. 
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In summary, the Project Review Group has determined to its satisfaction that the work 
of the Enhanced Human Performance Project is scientifically sound, appropriately managed 
-and monitored, and is providing valuable insight into the nature of an anomaly which could 


have a significant impact on the DoD. 


Dr. Nick Yaru, Chairman 


Project Review Group 
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PPENDIX 
IN-HOUSE STAFFING REQUIREMENTS 


(S/NF/SG/LIMDIS) An analysis of the PAG-TA functions 
necessary to support the achievement of the long-range goals 
indicate four major functional areas which must be supported. 
Within each functional area, personnel requirements can be 
identified. A complicating factor, however, is the fact that 
some of the functional areas (such as remote viewing (RV), 
Intelligence Analysis, and ADP support) are highly specialized 
and require full-time dedicated personnel. 


1. (S/NF/SG/LIMDIS) RV Activities: RV activities can be 
grouped into the following major areas: 


a. Participate in R & D activities with the 
external R&D contractor 

b. Viewer Training (both in-house and with 
the external R&D contractor) 

c. Operational Activities 


(S/NF/SG/LIMDIS) It is difficult to project personnel 
requirements for this functional area, primarily because the 
projected level of operational activity is currently unknown. 
Based on the past level of operational tasking, it is anticipated 
that up to six personnel could be required. Five of the people 
would be involved in operational activities as well as 
participating in support of the R&D activities to be conducted by 
the external Contractor. One additional person would be 
designated to participate in operational and research support 
activities on a part-time basis but would devote most of his time 
to developing a training program and conducting training of new 
personnel and identification/selection of potential viewers. Due 
to the specialized nature of RV, this person needs to be a 
qualified viewer and not merely an administrative person. It 
should also be kept in mind that it takes approximately one year 
to train a viewer to operational status. 


2. (U) Foreign Intelligence Assessment: Support of this 
functional area may be grouped into the following activities: 


a. Data source identification/collection 
b. Construction of Foreign Activities 
Data Base 
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c. Analysis 
dad. Production of finished intelligence 
assessments 


(U) To adequately meet the requirements of this 
functional area, two full-time personnel will be required: a 
Senior Intelligence Officer (SIO) and an Intelligence Technician 
(IT). In order to maintain strict protocol requirements, these 
personnel should not function as operational viewers. 


(U) The IT would identify potential sources of data, 
collect the data, support the construction of the Intelligence 
database and input the required data,and assist in the 
preparation of intelligence assessments. The SIO should be an 
all-source Scientific and Technical Intelligence analyst and 
would be responsible for the identification of collection 
requirements, the analysis of intelligence data, and the 
production of finished intelligence assessments on a world-wide 
basis. 


3. (S/NF) ADP Support: Over the period of time covered 
by this Plan, the ADP support activities of PAG-TA are 
anticipated to rise dramatically, requiring one full-time person 
to function as an ADP system administrator. Several factors 
justify this position: 


a. (S/NF) PAG-TA is currently in the process 
of upgrading its ADP system to include the acquisition of a Unix- 
based SUN workstation which will not only serve as the main 
system element, but will also be used to construct the 
Intelligence and the R&D databases, serve as the communications 
link to the external Contractor, and support the operation of 
special PAG-TA research equipment. Specific areas requiring 
specialized technical attention include: 


(1) Operating system(s) 

(2) Potential LAN(s) administration 

(3) Database construction/maintenance 

(4) Language compiler(s) 

(5) Peripherals 

(6) Equipment interfaces 

(7) Data communications 

(8) System modifications/upgrades 

(9) Development of special purpose 
software to support the PAG-TA mission 


SECRET 
NOT RELEASABLE TO FOREIGN NATIONALS 
STAR GATE 
LIMDIS 


I-2 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


Approved For Release 2003/04/18 : CIA-RDP96-00789R002700020001-0 


SECRET 


b. (C) PAG-TA is located some distance from 
the main Agency computer support facilities. Should the PAG-TA 
system experience problems or failures, the system would be down 


until someone from the main facility could travel to the PAG-TA 
location to effect repairs, resulting in a loss of productivity 
during the wait period. Also, any system modification/upgrades 


would have to depend on the schedule of qualified personnel, 
again resulting in loss of productivity. Therefore; it is 
essential that a person will the necessary computer science 
skills be physically located at the PAG-TA facility. 


4. (S/NF/SG/LIMDIS) Branch Administration: Tasks in this 
functional area may be grouped as follows: 


a. Word Processing 
(1) Electronic Filing 
(2) Management Support 
(3) Security Administration 
(4) Report Generation/Document Preparation 
(5) RV Tasking 
(6) Generation of RV Target Pools 


b. Project/Contract Management 
c. Collection Management 
d. Ft. Meade Interface/Facilities 


5. (S/NF/SG/LIMDIS) Tasks in this area will require three 
to four personnel--a Branch Chief, a person functioning as an 
Assistant Branch Chief (probably the SIO), a Secretary and, 
possibly, a Collection Manager (unless this can be done on an "as 
required" basis by other Branch personnel). The Branch Chief and 
SIO should have experience in project/contract management, 
primarily to deal with external research/support contracts, as 
well as the ability to interface with the academic community and 
professional organizations engaged in parapsychological 
activities in addition to overall management skills associated 
with managing a Branch-size organization. 


(C) Based on this evaluation, a total of 11-12 
personnel could be required to effectively achieve PAG-TA goals. 
No attempt has been made to identify the personnel as either 
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military or civilian. This represents an increase of 1-2 
personnel over the current authorization. However; it may be 
more desirable to keep the manning level at current strength (10 
authorized/7 assigned) and adjust the existing skill mix at PAG- 
TA to more effectively meet anticipated programmatic demands 
through personnel transfers/reassignments. 
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